[ 
https://issues.apache.org/jira/browse/HBASE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-5778:
--------------------------------------

    Attachment: HBASE-5778-0.94.patch

Attaching a first pass on making replication and HLog compression best buddies 
(here against 0.94).

Most of the changes are leaks since I need the context all over the place.

The meatier part is keeping track of the context in ReplicationSource. 
Basically we get a new one the first time we read the HLog then we just keep 
passing it back. I made sure to set the context to null when sending 
{{WALEdits}} to the sink.

The second part was managing the missing dict entries when recovering a log 
with a last known position. I could think of a few solutions:

 - Reset the last known position back to 0 and resend all the edits. Basically 
this ignores the problem.
 - Add a "fast forward" method in the code to just read the file up to the last 
known position.
 - Introduce new checks in order to read the log from 0 (using the normal code 
path) but then skip all the entries until we get to the last known position.

I implemented the last one. It adds a lot of new things to track which I don't 
like but it should be "correct".

I also added a new test which is just enabling WAL compression on 
TestReplication's master cluster. Everything passes.
                
> Turn on WAL compression by default
> ----------------------------------
>
>                 Key: HBASE-5778
>                 URL: https://issues.apache.org/jira/browse/HBASE-5778
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Jean-Daniel Cryans
>            Assignee: Lars Hofhansl
>            Priority: Blocker
>             Fix For: 0.96.0
>
>         Attachments: 5778.addendum, 5778-addendum.txt, HBASE-5778-0.94.patch, 
> HBASE-5778.patch
>
>
> I ran some tests to verify if WAL compression should be turned on by default.
> For a use case where it's not very useful (values two order of magnitude 
> bigger than the keys), the insert time wasn't different and the CPU usage 15% 
> higher (150% CPU usage VS 130% when not compressing the WAL).
> When values are smaller than the keys, I saw a 38% improvement for the insert 
> run time and CPU usage was 33% higher (600% CPU usage VS 450%). I'm not sure 
> WAL compression accounts for all the additional CPU usage, it might just be 
> that we're able to insert faster and we spend more time in the MemStore per 
> second (because our MemStores are bad when they contain tens of thousands of 
> values).
> Those are two extremes, but it shows that for the price of some CPU we can 
> save a lot. My machines have 2 quads with HT, so I still had a lot of idle 
> CPUs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to