Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Ryan Rawson Sat, 14 Nov 2009 23:23:03 -0800

That sync at the end of a RPC is my doing. You dont want to sync every
_EDIT_, after all, the previous definition of the word "edit" was each
KeyValue.  So we could be calling sync for every single column in a
row. Bad stuff.


In the end, if the regionserver crashes during a batch put, we will
never know how much of the batch was flushed to the WAL. Thus it makes
sense to only do it once and get a massive, massive, speedup.

On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
> I'm for leaving it as it is, at every 100 edits -- maybe every 10 edits?
> Speed stays as it was.  We used to lose MBs.  By default, we'll now lose 99
> or 9 edits max.
>
> We need to do some work bringing folks along regardless of what we decide.
> Flush happens at the end of the put up in the regionserver.  If you are
> doing a batch of commits -- e.g. using a big write buffer over on your
> client -- the puts will only be flushed on the way out after the batch put
> completes EVEN if you have configured hbase to sync every edit (I ran into
> this this evening.  J-D sorted me out).  We need to make sure folks are up
> on this.
>
> St.Ack
>
>
>
> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans 
> <jdcry...@apache.org>wrote:
>
>> Hi dev!
>>
>> Hadoop 0.21 now has a reliable append and flush feature and this gives
>> us the opportunity to review some assumptions. The current situation:
>>
>> - Every edit going to a catalog table is flushed so there's no data loss.
>> - The user tables edits are flushed every
>> hbase.regionserver.flushlogentries which by default is 100.
>>
>> Should we now set this value to 1 in order to have more durable but
>> slower inserts by default? Please speak up.
>>
>> Thx,
>>
>> J-D
>>
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Reply via email to