I agree with this. 

I also think we should leave the default as is with the caveat that we call out 
the durability versus write performance tradeoff in the flushlogentries 
description and up on the wiki somewhere, maybe on 
http://wiki.apache.org/hadoop/PerformanceTuning . We could also provide two 
example configurations, one for performance (reasonable tradeoffs), one for 
paranoia. I put up an issue: https://issues.apache.org/jira/browse/HBASE-1984

    - Andy




________________________________
From: Ryan Rawson <ryano...@gmail.com>
To: hbase-dev@hadoop.apache.org
Sent: Sat, November 14, 2009 11:22:13 PM
Subject: Re: Should we change the default value of 
hbase.regionserver.flushlogentries  for 0.21?

That sync at the end of a RPC is my doing. You dont want to sync every
_EDIT_, after all, the previous definition of the word "edit" was each
KeyValue.  So we could be calling sync for every single column in a
row. Bad stuff.

In the end, if the regionserver crashes during a batch put, we will
never know how much of the batch was flushed to the WAL. Thus it makes
sense to only do it once and get a massive, massive, speedup.

On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
> I'm for leaving it as it is, at every 100 edits -- maybe every 10 edits?
> Speed stays as it was.  We used to lose MBs.  By default, we'll now lose 99
> or 9 edits max.
>
> We need to do some work bringing folks along regardless of what we decide.
> Flush happens at the end of the put up in the regionserver.  If you are
> doing a batch of commits -- e.g. using a big write buffer over on your
> client -- the puts will only be flushed on the way out after the batch put
> completes EVEN if you have configured hbase to sync every edit (I ran into
> this this evening.  J-D sorted me out).  We need to make sure folks are up
> on this.
>
> St.Ack
>
>
>
> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans 
> <jdcry...@apache.org>wrote:
>
>> Hi dev!
>>
>> Hadoop 0.21 now has a reliable append and flush feature and this gives
>> us the opportunity to review some assumptions. The current situation:
>>
>> - Every edit going to a catalog table is flushed so there's no data loss.
>> - The user tables edits are flushed every
>> hbase.regionserver.flushlogentries which by default is 100.
>>
>> Should we now set this value to 1 in order to have more durable but
>> slower inserts by default? Please speak up.
>>
>> Thx,
>>
>> J-D
>>
>



      

Reply via email to