[ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879204#action_12879204
 ] 

Greg Roelofs commented on MAPREDUCE-469:
----------------------------------------


I can't think of a _good_ use case for it, but a few years of development
experience have taught me that, planetwide, someone else very well may have.
Hence my question to mapreduce-user.

The one thing that did occur to me was financially oriented, e.g., if an
existing data flow just fits within budget (whether actual dollars or grid
capacity or time limits or whatever) because it's been reading half of the
available data and getting "good enough" results.  Suddenly doubling its
usage (or 50x, as in your case) without adequate warning could be quite
painful.  Admittedly, this is a weak example and probably very unlikely,
but there may be a real case that's somewhat similar.

Note that I'm perfectly willing to hardcode it always-on just like the
trunk's bzip2 code; that would simplify the code, eliminate a per-bufferload
conditional, and just generally be cleaner.  I'm also happy to leave it
configurable but on by default.  However, I'd like to give the user community
a chance to pipe up in case there actually is a problematic use case out
there that you and I have overlooked.


> Support concatenated gzip and bzip2 files
> -----------------------------------------
>
>                 Key: MAPREDUCE-469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Tom White
>            Assignee: Greg Roelofs
>         Attachments: grr-hadoop-common.dif.20100614c, 
> grr-hadoop-mapreduce.dif.20100614c
>
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to