[jira] Commented: (HADOOP-5014) Support concatenated gzip files

Oscar Gothberg (JIRA) Mon, 12 Jan 2009 12:20:34 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663073#action_12663073
 ]


Oscar Gothberg commented on HADOOP-5014:
----------------------------------------

Thanks Tom for filing the Jira on this issue.

Does anyone know a possible alternate gzip codec that could just be plugged 
into Hadoop? I suddenly have terabytes of gzipped data that I don't know if I 
can trust hadoop with, and it would be a big pain to re-archive everything.

> Support concatenated gzip files
> -------------------------------
>
>                 Key: HADOOP-5014
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5014
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: io, mapred
>    Affects Versions: 0.17.0, 0.19.0
>            Reporter: Tom White
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5014) Support concatenated gzip files

Reply via email to