Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

stack Fri, 11 Dec 2009 17:36:13 -0800

Sounds good.
St.Ack

On Fri, Dec 11, 2009 at 5:33 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:


> Ok to make sure I get this right:
>
> - we enable deferred log flush by default
> - we set flushlogentries=1
>
> Also since 10 seconds is kind of a huge window I propose that:
>
> - we set optionalLogFlush=1000
>
> which is the MySQL default. We also have to update the wiki (there's
> already an entry on deferred log flush) by adding the configuration of
> flushlogentries.
>
> I'll open a jira.
>
> J-D
>
> On Fri, Dec 11, 2009 at 5:26 PM, stack <st...@duboce.net> wrote:
> > Yeah, +1 on deferred log flush.  Good man J-D.
> >
> > Can we also update performance wiki page to list how to up your write
> speed
> > at cost of possible increased edit loss?
> >
> > St.Ack
> >
> >
> > On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ryano...@gmail.com> wrote:
> >
> >> Looks like deferred log flush is the clear winner here, and probably
> >> has a smaller chance of loss than the 100 logflushentries.
> >>
> >> I dare say we should ship with that as the default...
> >>
> >> -ryan
> >>
> >> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <
> jdcry...@apache.org>
> >> wrote:
> >> > So to satisfy Ryan's thirst of cluster number, here they are:
> >> >
> >> > Default (with write buffer)
> >> > 65 060ms
> >> >
> >> > The rest is without the write buffer (which is so well optimized that
> >> > we only sync once per 2MB batch). I ran it once with entries=1 because
> >> > it's taking so long.
> >> >
> >> > 1 logflushentries
> >> > 2 188 737ms
> >> >
> >> > 100 logflushentries
> >> > 697 590ms
> >> > 698 082ms
> >> >
> >> > deferred log flush
> >> > 545 836ms
> >> > 532 788ms
> >> >
> >> > The cluster is composed of 15 i7s (a bit overkill) but it shows that
> >> > it runs much slower because of network, replication, etc.
> >> >
> >> > Also on another cluster (same hardware) I did some 0.20 testing:
> >> >
> >> > With write buffer:
> >> > 131 811ms
> >> >
> >> > Without:
> >> > 602 842ms
> >> >
> >> > Keep in mind that the sync we call isn't HDFS-265.
> >> >
> >> > J-D
> >> >
> >> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
> >> >> Thanks for picking up this discussion again J-D.
> >> >>
> >> >> See below.
> >> >>
> >> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <
> jdcry...@apache.org
> >> >wrote:
> >> >>
> >> >>> I have the feeling that this discussion isn't over, there's no
> >> >>> consensus yet, so I did some tests to get some numbers.
> >> >>>
> >> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same
> >> >>> numbers on every different config with it) on a standalone setup.
> >> >>
> >> >>
> >> >> The write buffer is disabled because otherwise it will get in the way
> of
> >> the
> >> >> hbase.regionserver.flushlogentries=1?
> >> >>
> >> >> It would be interesting to get a baseline for 0.20 which IMO would be
> >> >> settings we had in 0.19 w/ write buffer.  Would be good for
> comparison.
> >> >>
> >> >> You like the idea of the sync being time-based rather than some
> number
> >> of
> >> >> edits?  I can see fellas wanting both.
> >> >>
> >> >> stack
> >> >>
> >> >>
> >> >> I
> >> >>> stopped HBase and deleted the data dir between each run.
> >> >>>
> >> >>> - hbase.regionserver.flushlogentries=1 and
> >> >>> hbase.regionserver.optionallogflushinterval=1000
> >> >>>  ran in 354765ms
> >> >>>
> >> >>> - hbase.regionserver.flushlogentries=100 and
> >> >>> hbase.regionserver.optionallogflushinterval=1000
> >> >>>  run #1 in 333972ms
> >> >>>  run #2 in 331943ms
> >> >>>
> >> >>> - hbase.regionserver.flushlogentries=1,
> >> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
> >> >>> enabled on TestTable
> >> >>>  run #1 in 309857ms
> >> >>>  run #2 in 311440ms
> >> >>>
> >> >>> So 100 entries per flush takes ~7% less time, deferred flush takes
> 14%
> >> >>> less.
> >> >>>
> >> >>> I thereby think that not only should we set flushlogentries=1 in
> 0.21,
> >> >>> but also we should enable deferred log flush by default with a lower
> >> >>> optional log flush interval. It will be a nearly as safe but much
> >> >>> faster alternative to the previous option. I would even get rid of
> the
> >> >>> hbase.regionserver.flushlogentries config.
> >> >>>
> >> >>> J-D
> >> >>>
> >> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <
> >> jdcry...@apache.org>
> >> >>> wrote:
> >> >>> > Well it's even better than that ;) We have optional log flushing
> >> which
> >> >>> > by default is 10 secs. Make that 100 milliseconds and that's as
> much
> >> >>> > data you can lose. If any other table syncs then this table's
> edits
> >> >>> > are also synced.
> >> >>> >
> >> >>> > J-D
> >> >>> >
> >> >>> >
> >> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com
> >
> >> >>> wrote:
> >> >>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So
> I
> >> >>> could
> >> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a
> >> batch of
> >> >>> >> my inserts, and then run an explicit flush/sync.  The returning
> of
> >> that
> >> >>> >> call would guarantee to the client that the data up to that point
> is
> >> >>> safe.
> >> >>> >>
> >> >>> >> JG
> >> >>> >>
> >> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
> >> >>> >>> I added a new feature for tables called "deferred flush", see
> >> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> My opinion is that the default should be paranoid enough to not
> >> lose
> >> >>> >>> any user data. If we can change a table's attribute without
> taking
> >> it
> >> >>> down
> >> >>> >>> (there's a jira on that), wouldn't that solve the import
> problem?
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> For example: have some table that needs to have fast insertion
> via
> >> MR.
> >> >>> >>> During the creation of the job, you change the table's
> >> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set
> the
> >> >>> >>> value to false when the job is done.
> >> >>> >>>
> >> >>> >>> This way you still pass the responsibility to the user but for
> >> >>> >>> performance reasons.
> >> >>> >>>
> >> >>> >>> J-D
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <
> cleh...@adobe.com>
> >> >>> wrote:
> >> >>> >>>
> >> >>> >>>> We could have a speedy default and an extra parameter for puts
> >> that
> >> >>> >>>> would specify a flush is needed. This way you pass the
> >> responsibility
> >> >>> to
> >> >>> >>>> the user and he can decide if he needs to be paranoid or not.
> This
> >> >>> could
> >> >>> >>>> be part of Put and even specify granularity of the flush if
> >> needed.
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> Cosmin
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org>
> >> wrote:
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>>> I agree with this.
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> I also think we should leave the default as is with the caveat
> >> that
> >> >>> >>>>> we call out the durability versus write performance tradeoff
> in
> >> the
> >> >>> >>>>> flushlogentries description and up on the wiki somewhere,
> maybe
> >> on
> >> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could
> also
> >> >>> >>>>> provide two example configurations, one for performance
> >> (reasonable
> >> >>> >>>>> tradeoffs), one for paranoia. I put up an issue:
> >> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>     - Andy
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>> ________________________________
> >> >>> >>>>> From: Ryan Rawson <ryano...@gmail.com>
> >> >>> >>>>> To: hbase-dev@hadoop.apache.org
> >> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
> >> >>> >>>>> Subject: Re: Should we change the default value of
> >> >>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
> >> >>> >>>>>
> >> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to
> sync
> >> >>> >>>>> every _EDIT_, after all, the previous definition of the word
> >> "edit"
> >> >>> >>>>> was each KeyValue.  So we could be calling sync for every
> single
> >> >>> >>>>> column in a row. Bad stuff.
> >> >>> >>>>>
> >> >>> >>>>> In the end, if the regionserver crashes during a batch put, we
> >> will
> >> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus
> it
> >> >>> makes
> >> >>> >>>>>  sense to only do it once and get a massive, massive, speedup.
> >> >>> >>>>>
> >> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net>
> wrote:
> >> >>> >>>>>
> >> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe
> every
> >> 10
> >> >>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By
> default,
> >> >>> >>>>>> we'll now lose 99 or 9 edits max.
> >> >>> >>>>>>
> >> >>> >>>>>> We need to do some work bringing folks along regardless of
> what
> >> we
> >> >>> >>>>>> decide. Flush happens at the end of the put up in the
> >> regionserver.
> >> >>> >>>>>>  If you are
> >> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer
> over
> >> on
> >> >>> >>>>>> your client -- the puts will only be flushed on the way out
> >> after
> >> >>> >>>>>> the batch put completes EVEN if you have configured hbase to
> >> sync
> >> >>> >>>>>> every edit (I ran into this this evening.  J-D sorted me
> out).
> >>  We
> >> >>> >>>>>> need to make sure folks are up on this.
> >> >>> >>>>>>
> >> >>> >>>>>> St.Ack
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
> >> >>> >>>>>> <jdcry...@apache.org>wrote:
> >> >>> >>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>>> Hi dev!
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and
> >> this
> >> >>> >>>>>>> gives us the opportunity to review some assumptions. The
> >> current
> >> >>> >>>>>>> situation:
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's
> no
> >> >>> >>>>>>> data loss. - The user tables edits are flushed every
> >> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
> >> >>> >>>>>>>
> >> >>> >>>>>>> Should we now set this value to 1 in order to have more
> durable
> >> >>> >>>>>>> but slower inserts by default? Please speak up.
> >> >>> >>>>>>>
> >> >>> >>>>>>> Thx,
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>> J-D
> >> >>> >>>>>>>
> >> >>> >>>>>>>
> >> >>> >>>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>>
> >> >>> >>>>
> >> >>> >>>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>
> >> >>> >>
> >> >>> >
> >> >>>
> >> >>
> >> >
> >>
> >
>

Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?

Reply via email to