[
https://issues.apache.org/jira/browse/HBASE-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908253#action_12908253
]
Andrew Purtell commented on HBASE-2987:
---------------------------------------
Also I agree slow compactions can trip up writers also. In this case as I
experimented I was able to tune the system so compactions were not significant
but could do nothing about slow flushes.
> Avoid compressing flush files
> -----------------------------
>
> Key: HBASE-2987
> URL: https://issues.apache.org/jira/browse/HBASE-2987
> Project: HBase
> Issue Type: Improvement
> Reporter: Andrew Purtell
> Assignee: Andrew Purtell
> Priority: Minor
> Attachments: HBASE-2987.patch
>
>
> I've extended Hadoop compression to use the LZMA algorithm and HFile to
> provide an option for selecting it. With typical input, the LZMA algorithm
> produces 30% smaller output than GZIP at max compression (which is currently
> the best available option for HFiles) and 15% smaller output than BZIP2. I'm
> aware of the "disk is cheap" mantra but for a multi-peta-scale archival
> application, where we still want random read and random update capabilities,
> 30% less disk is a substantial cost savings. LZMA compression speed is ~1
> MB/second on a 2 GHz CPU, decompression speed is ~20 MB/second. This is 4x
> slower than BZIP2 to compress but at least 2x faster to decompress for 15%
> better results. For an archival application these properties would be
> acceptable if not for the very significant problem of flushing. Obviously the
> low throughput of the LZMA compressor means it is unsuitable for foreground
> processing. In HBase terms, it can be used for compaction but not for flush
> files.
> Attached patch, against 0.20 branch, turns off compression for flushes. This
> could be implemented as a config option, but I wonder if with the possible
> exception of LZO should we be compressing flushes at all? Any significant
> reduction in flush throughput can stall writers during periods of high write
> activity. Maybe globally disabling compression on flush flies is a good
> thing?
> I have tested this and confirmed the result is the desired behavior: 'file'
> shows flush files as uncompressed data, compacted files as compressed.
> Compaction merges files with different compression properties. LZMA provides
> rather extreme space savings over the other available options without slowing
> down writers if the regionservers are configured with enough write buffering
> to ride over the significantly lengthened compaction times.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.