[
https://issues.apache.org/jira/browse/HBASE-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925977#action_12925977
]
Kannan Muthukkaruppan commented on HBASE-3166:
----------------------------------------------
Yes, I ran into this recently.
Turns out the compression part is already possible. The "export" uses the
GenericOptionsParser, which allows passing a bunch of settings as -D options.
{code}
bin/hadoop jar <pathToHBaseJar> export -D mapred.output.compress=true -D
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec -D
mapred.output.compression.type=BLOCK <tablename> <outputdirname>
{code}
We need to improve the documentation around this.
As part of finding this, I have also added support for exporting just a
specific column family as well as turning block cache off during export. Will
create a separate JIRA for the same and post a patch.
> HBase exporter should compress output files by default (or at least allow
> this as an option)
> --------------------------------------------------------------------------------------------
>
> Key: HBASE-3166
> URL: https://issues.apache.org/jira/browse/HBASE-3166
> Project: HBase
> Issue Type: Improvement
> Components: util
> Affects Versions: 0.20.6
> Reporter: Josh Rosenblum
> Priority: Minor
>
> The HBase exporter puts (key, Result) pairs as keys and values into an output
> sequence file.
> There could be significant savings at low cost if at least default
> compression was enabled on this output sequence file.
> In createSubmittableJob(), this might be as simple as adding the following:
> SequenceFileOutputFormat.setOutputCompressionType(job,
> SequenceFile.CompressionType.BLOCK);
> SequenceFileOutputFormat.setCompressOutput(job, true);
> FileOutputFormat.setOutputCompressorClass(job, DefaultCodec.class);
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.