[jira] Updated: (HADOOP-1824) want InputFormat for zip files

Ankur (JIRA) Wed, 23 Jan 2008 02:50:55 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ankur updated HADOOP-1824:
--------------------------

    Attachment: ZipInputFormat_fixed.patch

Following issues reported by QA were fixed

1. Findbugs errors in ZipInputFormat.java were fixed. The streams are now 
closed properly in isSplitable() and getSplits() methods.
2.  Javadoc comments fixed and verified that no new javadoc warnings are 
generated after applying the patch.
3.  Fixed formatting in the code.
4.  core-tests and contrib-tests are now passing after the above changes.

Kindly verify.

> want InputFormat for zip files
> ------------------------------
>
>                 Key: HADOOP-1824
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1824
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.15.2
>            Reporter: Doug Cutting
>         Attachments: ZipInputFormat_fixed.patch
>
>
> HDFS is inefficient with large numbers of small files.  Thus one might pack 
> many small files into large, compressed, archives.  But, for efficient 
> map-reduce operation, it is desireable to be able to split inputs into 
> smaller chunks, with one or more small original file per split.  The zip 
> format, unlike tar, permits enumeration of files in the archive without 
> scanning the entire archive.  Thus a zip InputFormat could efficiently permit 
> splitting large archives into splits that contain one or more archived files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1824) want InputFormat for zip files

Reply via email to