[ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833453#action_12833453 ]
Ted Yu commented on HBASE-2225: ------------------------------- Using command line switch is fine. I think we can make this feature more versatile by naming the switch no_compression_export. Meaning by default, GzipCodec is used for Export. We detect compression mode of the table first. If the table is compressed, we don't apply GzipCodec. Otherwise we apply GzipCodec unless no_compression_export is specified. Since SequenceFileInputFormat is able to handle GzipCodec, this won't cause regression for the Import class. > Enable compression in HBase Export > ---------------------------------- > > Key: HBASE-2225 > URL: https://issues.apache.org/jira/browse/HBASE-2225 > Project: Hadoop HBase > Issue Type: Improvement > Components: util > Affects Versions: 0.20.1 > Environment: OS agnostic > Reporter: Ted Yu > Priority: Minor > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > org.apache.hadoop.hbase.mapreduce.Export should set compression codec > In createSubmittableJob(), the following should be added: > FileOutputFormat.setCompressOutput(job, true); > FileOutputFormat.setOutputCompressorClass(job, > org.apache.hadoop.io.compress.GzipCodec.class); > From my experiment, 10% to 50% reduction in Export output has been observed. > SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - > there is no change for Import class. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.