[ 
https://issues.apache.org/jira/browse/HBASE-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925977#action_12925977
 ] 

Kannan Muthukkaruppan commented on HBASE-3166:
----------------------------------------------

Yes, I ran into this recently.

Turns out the compression part is already possible. The "export" uses the 
GenericOptionsParser, which allows passing a bunch of settings as -D options.

{code}
 bin/hadoop jar <pathToHBaseJar> export -D mapred.output.compress=true -D 
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec -D 
mapred.output.compression.type=BLOCK <tablename> <outputdirname>
{code}

We need to improve the documentation around this.

As part of finding this, I have also added support for exporting just a 
specific column family as well as turning block cache off during export. Will 
create a separate JIRA for the same and post a patch.



> HBase exporter should compress output files by default (or at least allow 
> this as an option)
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3166
>                 URL: https://issues.apache.org/jira/browse/HBASE-3166
>             Project: HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.6
>            Reporter: Josh Rosenblum
>            Priority: Minor
>
> The HBase exporter puts (key, Result) pairs as keys and values into an output 
> sequence file.
> There could be significant savings at low cost if at least default 
> compression was enabled on this output sequence file.
> In createSubmittableJob(), this might be as simple as adding the following:
>         SequenceFileOutputFormat.setOutputCompressionType(job, 
> SequenceFile.CompressionType.BLOCK);
>         SequenceFileOutputFormat.setCompressOutput(job, true);
>         FileOutputFormat.setOutputCompressorClass(job, DefaultCodec.class);

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to