Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Ryan Rawson Fri, 11 Dec 2009 13:36:20 -0800

Looks like deferred log flush is the clear winner here, and probably
has a smaller chance of loss than the 100 logflushentries.


I dare say we should ship with that as the default...

-ryan

On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
> So to satisfy Ryan's thirst of cluster number, here they are:
>
> Default (with write buffer)
> 65 060ms
>
> The rest is without the write buffer (which is so well optimized that
> we only sync once per 2MB batch). I ran it once with entries=1 because
> it's taking so long.
>
> 1 logflushentries
> 2 188 737ms
>
> 100 logflushentries
> 697 590ms
> 698 082ms
>
> deferred log flush
> 545 836ms
> 532 788ms
>
> The cluster is composed of 15 i7s (a bit overkill) but it shows that
> it runs much slower because of network, replication, etc.
>
> Also on another cluster (same hardware) I did some 0.20 testing:
>
> With write buffer:
> 131 811ms
>
> Without:
> 602 842ms
>
> Keep in mind that the sync we call isn't HDFS-265.
>
> J-D
>
> On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
>> Thanks for picking up this discussion again J-D.
>>
>> See below.
>>
>> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans 
>> <jdcry...@apache.org>wrote:
>>
>>> I have the feeling that this discussion isn't over, there's no
>>> consensus yet, so I did some tests to get some numbers.
>>>
>>> PE sequentialWrite 1 with the write buffer disabled (I get the same
>>> numbers on every different config with it) on a standalone setup.
>>
>>
>> The write buffer is disabled because otherwise it will get in the way of the
>> hbase.regionserver.flushlogentries=1?
>>
>> It would be interesting to get a baseline for 0.20 which IMO would be
>> settings we had in 0.19 w/ write buffer.  Would be good for comparison.
>>
>> You like the idea of the sync being time-based rather than some number of
>> edits?  I can see fellas wanting both.
>>
>> stack
>>
>>
>> I
>>> stopped HBase and deleted the data dir between each run.
>>>
>>> - hbase.regionserver.flushlogentries=1 and
>>> hbase.regionserver.optionallogflushinterval=1000
>>>  ran in 354765ms
>>>
>>> - hbase.regionserver.flushlogentries=100 and
>>> hbase.regionserver.optionallogflushinterval=1000
>>>  run #1 in 333972ms
>>>  run #2 in 331943ms
>>>
>>> - hbase.regionserver.flushlogentries=1,
>>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
>>> enabled on TestTable
>>>  run #1 in 309857ms
>>>  run #2 in 311440ms
>>>
>>> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
>>> less.
>>>
>>> I thereby think that not only should we set flushlogentries=1 in 0.21,
>>> but also we should enable deferred log flush by default with a lower
>>> optional log flush interval. It will be a nearly as safe but much
>>> faster alternative to the previous option. I would even get rid of the
>>> hbase.regionserver.flushlogentries config.
>>>
>>> J-D
>>>
>>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jdcry...@apache.org>
>>> wrote:
>>> > Well it's even better than that ;) We have optional log flushing which
>>> > by default is 10 secs. Make that 100 milliseconds and that's as much
>>> > data you can lose. If any other table syncs then this table's edits
>>> > are also synced.
>>> >
>>> > J-D
>>> >
>>> >
>>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
>>> wrote:
>>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
>>> could
>>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
>>> >> my inserts, and then run an explicit flush/sync.  The returning of that
>>> >> call would guarantee to the client that the data up to that point is
>>> safe.
>>> >>
>>> >> JG
>>> >>
>>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>>> >>> I added a new feature for tables called "deferred flush", see
>>> >>> https://issues.apache.org/jira/browse/HBASE-1944
>>> >>>
>>> >>>
>>> >>> My opinion is that the default should be paranoid enough to not lose
>>> >>> any user data. If we can change a table's attribute without taking it
>>> down
>>> >>> (there's a jira on that), wouldn't that solve the import problem?
>>> >>>
>>> >>>
>>> >>> For example: have some table that needs to have fast insertion via MR.
>>> >>> During the creation of the job, you change the table's
>>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>>> >>> value to false when the job is done.
>>> >>>
>>> >>> This way you still pass the responsibility to the user but for
>>> >>> performance reasons.
>>> >>>
>>> >>> J-D
>>> >>>
>>> >>>
>>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cleh...@adobe.com>
>>> wrote:
>>> >>>
>>> >>>> We could have a speedy default and an extra parameter for puts that
>>> >>>> would specify a flush is needed. This way you pass the responsibility
>>> to
>>> >>>> the user and he can decide if he needs to be paranoid or not. This
>>> could
>>> >>>> be part of Put and even specify granularity of the flush if needed.
>>> >>>>
>>> >>>>
>>> >>>> Cosmin
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org> wrote:
>>> >>>>
>>> >>>>
>>> >>>>> I agree with this.
>>> >>>>>
>>> >>>>>
>>> >>>>> I also think we should leave the default as is with the caveat that
>>> >>>>> we call out the durability versus write performance tradeoff in the
>>> >>>>> flushlogentries description and up on the wiki somewhere, maybe on
>>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>>> >>>>> provide two example configurations, one for performance (reasonable
>>> >>>>> tradeoffs), one for paranoia. I put up an issue:
>>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
>>> >>>>>
>>> >>>>>
>>> >>>>>     - Andy
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> ________________________________
>>> >>>>> From: Ryan Rawson <ryano...@gmail.com>
>>> >>>>> To: hbase-dev@hadoop.apache.org
>>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>>> >>>>> Subject: Re: Should we change the default value of
>>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
>>> >>>>>
>>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
>>> >>>>> every _EDIT_, after all, the previous definition of the word "edit"
>>> >>>>> was each KeyValue.  So we could be calling sync for every single
>>> >>>>> column in a row. Bad stuff.
>>> >>>>>
>>> >>>>> In the end, if the regionserver crashes during a batch put, we will
>>> >>>>> never know how much of the batch was flushed to the WAL. Thus it
>>> makes
>>> >>>>>  sense to only do it once and get a massive, massive, speedup.
>>> >>>>>
>>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>>> >>>>>
>>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
>>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>>> >>>>>> we'll now lose 99 or 9 edits max.
>>> >>>>>>
>>> >>>>>> We need to do some work bringing folks along regardless of what we
>>> >>>>>> decide. Flush happens at the end of the put up in the regionserver.
>>> >>>>>>  If you are
>>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over on
>>> >>>>>> your client -- the puts will only be flushed on the way out after
>>> >>>>>> the batch put completes EVEN if you have configured hbase to sync
>>> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
>>> >>>>>> need to make sure folks are up on this.
>>> >>>>>>
>>> >>>>>> St.Ack
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>>> >>>>>> <jdcry...@apache.org>wrote:
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> Hi dev!
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
>>> >>>>>>> gives us the opportunity to review some assumptions. The current
>>> >>>>>>> situation:
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> - Every edit going to a catalog table is flushed so there's no
>>> >>>>>>> data loss. - The user tables edits are flushed every
>>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>>> >>>>>>>
>>> >>>>>>> Should we now set this value to 1 in order to have more durable
>>> >>>>>>> but slower inserts by default? Please speak up.
>>> >>>>>>>
>>> >>>>>>> Thx,
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> J-D
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Reply via email to