Well you don't have to do this often, just try it once to see. You can
do it in the HBase shell:

major_compact '.META.'

And it takes 3-4 seconds.

J-D

On Tue, Oct 6, 2009 at 12:31 PM, Adam Silberstein
<[email protected]> wrote:
> Hi J-D,
> Thanks for the tips.  Tweaking the multiplier looks easy enough.  I'm not 
> sure how to force a major compaction.  If from M/R, does that mean you did it 
> with the HDFS/Hadoop API?  Any guess on how long that major compaction takes? 
>  Just wondering what it does to availability.
>
> Thanks,
> Adam
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel 
> Cryans
> Sent: Tuesday, October 06, 2009 9:11 AM
> To: [email protected]
> Subject: Re: random read/write performance
>
> Adam,
>
> Few thoughts:
>
>  - Do you use LZO?
>  - Instead of disabling the WAL, try first tweaking the safety net
> that's in place. For example, setting
> hbase.regionserver.logroll.multiplier to 1.5 or even higher will make
> it roll less often. The current value of 0.95 means you roll every
> ~62MB inserted in a regionserver. You can also set
> hbase.regionserver.maxlogs to something higher than 32 like 64.
>  - We flush the .META. table very very often and this results,
> sometimes after a big upload, in a lot of store files. Once I
> force-major compacted it during a MR job and speed went 500% faster
> because of the contention of all the clients.
>
> J-D
>
> On Tue, Oct 6, 2009 at 11:59 AM, Adam Silberstein
> <[email protected]> wrote:
>> Hi,
>>
>> Just wanted to give a quick update on our HBase benchmarking efforts at
>> Yahoo.  The basic use case we're looking at is:
>>
>> 1K records
>>
>> 20GB of records per node (and 6GB of memory per node, so data is not
>> memory resident)
>>
>> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>>
>> Multiple clients doing the reads/writes (i.e. 50-200)
>>
>> Measure throughput vs. latency, and see how high we can push the
>> throughput.
>>
>> Note that although we want to see where throughput maxes out, the
>> workload is random, rather than scan-oriented.
>>
>>
>>
>> I've been tweaking our HBase installation based on advice I've
>> read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
>> size set to 6GB per server, and have iCMS off.  I'm still using the REST
>> server instead of the java client.  We're about to move our benchmarking
>> tool to java, so at that point we can use the java API.  At that point,
>> I want to turn off WAL as well.  If anyone has more suggestions for this
>> workload (either things to try while still using REST, or things to try
>> once I have a java client), please let me know.
>>
>>
>>
>> Given all that, I'm currently seeing maximal throughput of about 300
>> ops/sec/server.  Has anyone with a similar disk-resident and random
>> workload seen drastically different numbers, or guesses for what I can
>> expect with the java client?
>>
>>
>>
>> Thanks!
>>
>> Adam
>>
>>
>

Reply via email to