[ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881058#action_12881058
 ] 

Greg Roelofs commented on MAPREDUCE-469:
----------------------------------------

Oh, two more review questions:
 - DecompressorStream currently supports two concatenation modes via a 
pseudo-ifdef ("final boolean useResetPartially"):  resetPartially(), which 
avoids any additional buffer copies at a cost of uglifying the Decompressor 
interface with this new method; or regular reset() + setInput() to recopy any 
"excess" bytes (that is, from stream N+1) at the end of stream N.  The amount 
of recopying in the latter case is dependent on the buffer sizes (typically 
64KB around here) and sizes of the concatenated gzip streams/members, but in 
general it won't be much.  Barring strong disagreement, I'll go with the latter 
approach and clean up all the resetPartially() stuff in the next (hopefully 
final) version of the patch.
 - Any last-minute qualms about hardcoding the concatenation behavior?  It 
would simplify the patch slightly and seems to be the preferred approach, so 
that's my plan for the next version.

> Support concatenated gzip and bzip2 files
> -----------------------------------------
>
>                 Key: MAPREDUCE-469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Tom White
>            Assignee: Greg Roelofs
>         Attachments: grr-hadoop-common.dif.20100614c, 
> grr-hadoop-mapreduce.dif.20100614c, MR-469.v2.yahoo-0.20.2xx-branch.patch
>
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to