[jira] Commented: (HADOOP-1824) want InputFormat for zip files

Ankur (JIRA) Wed, 30 Jan 2008 02:20:59 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563939#action_12563939
 ]


Ankur commented on HADOOP-1824:
-------------------------------

Ok, But I should be able to change the offset to the end of the stream since 
central directory structure of zip file is at the end.
Presently the FSDataInputStream.seek() throws IOExeption and doe not change the 
stream  position if I try to position it past the
end of stream which is unlike fseek() which positions the offset to end of 
stream.

Is there a workaround to this or is it a functionality that needs to be added ?

> want InputFormat for zip files
> ------------------------------
>
>                 Key: HADOOP-1824
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1824
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.15.2
>            Reporter: Doug Cutting
>         Attachments: ZipInputFormat_fixed.patch
>
>
> HDFS is inefficient with large numbers of small files.  Thus one might pack 
> many small files into large, compressed, archives.  But, for efficient 
> map-reduce operation, it is desireable to be able to split inputs into 
> smaller chunks, with one or more small original file per split.  The zip 
> format, unlike tar, permits enumeration of files in the archive without 
> scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
> splitting large archives into splits that contain one or more archived files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1824) want InputFormat for zip files

Reply via email to