[
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated HBASE-5778:
--------------------------------------
Attachment: HBASE-5778-0.94-v4.patch
This v4 of the patch pushes down the handling of reopened compressed files down
to SequenceFileLogReader. The two main changes:
- SequenceFileLogReader needs a way to be reused across multiple
open/seek/close cycles. For this I added a method called "reopen". The name
might be confusing.
- ReplicationSource used to just bluntly reopen whatever currentPath is, but
now this doesn't work with SFLR being kept around. To fix it I had to add a
little dance in ReplicationHLogReader to verify if the path given was different
(although still for the same file that was moved to .oldlogs).
The HLog and Replication tests pass.
> Turn on WAL compression by default
> ----------------------------------
>
> Key: HBASE-5778
> URL: https://issues.apache.org/jira/browse/HBASE-5778
> Project: HBase
> Issue Type: Improvement
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.96.0
>
> Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch,
> HBASE-5778-0.94-v2.patch, HBASE-5778-0.94-v3.patch, HBASE-5778-0.94-v4.patch,
> HBASE-5778.patch
>
>
> I ran some tests to verify if WAL compression should be turned on by default.
> For a use case where it's not very useful (values two order of magnitude
> bigger than the keys), the insert time wasn't different and the CPU usage 15%
> higher (150% CPU usage VS 130% when not compressing the WAL).
> When values are smaller than the keys, I saw a 38% improvement for the insert
> run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure
> WAL compression accounts for all the additional CPU usage, it might just be
> that we're able to insert faster and we spend more time in the MemStore per
> second (because our MemStores are bad when they contain tens of thousands of
> values).
> Those are two extremes, but it shows that for the price of some CPU we can
> save a lot. My machines have 2 quads with HT, so I still had a lot of idle
> CPUs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira