[
https://issues.apache.org/jira/browse/HADOOP-13340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534993#comment-16534993
]
Koji Noguchi commented on HADOOP-13340:
---------------------------------------
I do sometimes want this compression feature when I want to keep a backup copy
of our users' directories or when har-archiving bunch of job history&configs.
And yes, "transparent compression" with overhead of decoding up to an entire
codec block would be nice.
However, in addition to this overhead of finding the head of the original file,
there is another overhead when users need to perform random reads on the
original files. As I understand, suggested design would only allow us to
decompress from the head of the file.
If we have hadoop job with 10 mappers reading from a single text file, this
would be hard to perform with the proposed compressed-har when each mapper
trying to read the text file from a specific offset.
Maybe we can live with a semi-transparent hadoop-archive compression that would
only let you read from the head of each file? This would be similar to old
hftp implementation where we didn't allow seek/positional-reads.
> Compress Hadoop Archive output
> ------------------------------
>
> Key: HADOOP-13340
> URL: https://issues.apache.org/jira/browse/HADOOP-13340
> Project: Hadoop Common
> Issue Type: New Feature
> Components: tools
> Affects Versions: 2.5.0
> Reporter: Duc Le Tu
> Priority: Major
> Labels: features, performance
>
> Why Hadoop Archive tool cannot compress output like other map-reduce job?
> I used some options like -D mapreduce.output.fileoutputformat.compress=true
> -D
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
> but it's not work. Did I wrong somewhere?
> If not, please support option for compress output of Hadoop Archive tool,
> it's very neccessary for data retention for everyone (small files problem and
> compress data).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]