[jira] Commented: (HBASE-2987) Avoid compressing flush files

Andrew Purtell (JIRA) Fri, 10 Sep 2010 17:18:14 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908252#action_12908252
 ]


Andrew Purtell commented on HBASE-2987:
---------------------------------------

bq. unless you're talking about the first flush not finishing before it's time 
for the second flush?

Yes, and the write gate comes down because the memstore limit on the region is 
reached.



      



> Avoid compressing flush files
> -----------------------------
>
>                 Key: HBASE-2987
>                 URL: https://issues.apache.org/jira/browse/HBASE-2987
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Minor
>         Attachments: HBASE-2987.patch
>
>
> I've extended Hadoop compression to use the LZMA algorithm and HFile to 
> provide an option for selecting it. With typical input, the LZMA algorithm 
> produces 30% smaller output than GZIP at max compression (which is currently 
> the best available option for HFiles) and 15% smaller output than BZIP2. I'm 
> aware of the "disk is cheap" mantra but for a multi-peta-scale archival 
> application, where we still want random read and random update capabilities, 
> 30% less disk is a substantial cost savings. LZMA compression speed is ~1 
> MB/second on a 2 GHz CPU, decompression speed is ~20 MB/second. This is 4x 
> slower than BZIP2 to compress but at least 2x faster to decompress for 15% 
> better results. For an archival application these properties would be 
> acceptable if not for the very significant problem of flushing. Obviously the 
> low throughput of the LZMA compressor means it is unsuitable for foreground 
> processing. In HBase terms, it can be used for compaction but not for flush 
> files. 
> Attached patch, against 0.20 branch, turns off compression for flushes. This 
> could be implemented as a config option, but I wonder if with the possible 
> exception of LZO should we be compressing flushes at all? Any significant 
> reduction in flush throughput can stall writers during periods of high write 
> activity. Maybe globally disabling compression on flush flies is a good 
> thing? 
> I have tested this and confirmed the result is the desired behavior: 'file' 
> shows flush files as uncompressed data, compacted files as compressed. 
> Compaction merges files with different compression properties. LZMA provides 
> rather extreme space savings over the other available options without slowing 
> down writers if the regionservers are configured with enough write buffering 
> to ride over the significantly lengthened compaction times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2987) Avoid compressing flush files

Reply via email to