[ 
https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275375#comment-15275375
 ] 

Suraj Nayak commented on HADOOP-13114:
--------------------------------------

[~raviprak] : Regarding your 
[comment|https://issues.apache.org/jira/browse/HADOOP-8065?focusedCommentId=15269857&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15269857]
 on reusing codec instead of creating new for each file, here are my thoughts 
and questions:
* {{org.apache.hadoop.io.compress.CompressionCodec.Util}} has a static Util 
class which consists of {{createOutputStreamWithCodecPool}} method. Do you 
think its good idea to change the class and method to public ? 
* I thought of copying the {{createOutputStreamWithCodecPool}} method code into 
{{DistCpUtils}}, but that will result in code duplication. What would you 
suggest for making this code reusable?

> DistCp should have option to compress data on write
> ---------------------------------------------------
>
>                 Key: HADOOP-13114
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13114
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Suraj Nayak
>            Assignee: Suraj Nayak
>            Priority: Minor
>              Labels: distcp
>             Fix For: 3.0.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified 
> compression format. This avoids one hop of compressing data after transfer. 
> Backup strategies to different cluster also get benefit of saving one IO 
> operation to and from HDFS, thus saving resources, time and effort.
> * Create an option -compressOutput defaulting to 
> {{org.apache.hadoop.io.compress.BZip2Codec}}. 
> * Users will be able to change codec with {{-D 
> mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec 
> extension to indicate the file is compressed. Thus users can be aware of what 
> codec was used to compress the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to