Ok to make sure I get this right:

- we enable deferred log flush by default
- we set flushlogentries=1

Also since 10 seconds is kind of a huge window I propose that:

- we set optionalLogFlush=1000

which is the MySQL default. We also have to update the wiki (there's
already an entry on deferred log flush) by adding the configuration of
flushlogentries.

I'll open a jira.

J-D

On Fri, Dec 11, 2009 at 5:26 PM, stack <st...@duboce.net> wrote:
> Yeah, +1 on deferred log flush.  Good man J-D.
>
> Can we also update performance wiki page to list how to up your write speed
> at cost of possible increased edit loss?
>
> St.Ack
>
>
> On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>
>> Looks like deferred log flush is the clear winner here, and probably
>> has a smaller chance of loss than the 100 logflushentries.
>>
>> I dare say we should ship with that as the default...
>>
>> -ryan
>>
>> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jdcry...@apache.org>
>> wrote:
>> > So to satisfy Ryan's thirst of cluster number, here they are:
>> >
>> > Default (with write buffer)
>> > 65 060ms
>> >
>> > The rest is without the write buffer (which is so well optimized that
>> > we only sync once per 2MB batch). I ran it once with entries=1 because
>> > it's taking so long.
>> >
>> > 1 logflushentries
>> > 2 188 737ms
>> >
>> > 100 logflushentries
>> > 697 590ms
>> > 698 082ms
>> >
>> > deferred log flush
>> > 545 836ms
>> > 532 788ms
>> >
>> > The cluster is composed of 15 i7s (a bit overkill) but it shows that
>> > it runs much slower because of network, replication, etc.
>> >
>> > Also on another cluster (same hardware) I did some 0.20 testing:
>> >
>> > With write buffer:
>> > 131 811ms
>> >
>> > Without:
>> > 602 842ms
>> >
>> > Keep in mind that the sync we call isn't HDFS-265.
>> >
>> > J-D
>> >
>> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote:
>> >> Thanks for picking up this discussion again J-D.
>> >>
>> >> See below.
>> >>
>> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jdcry...@apache.org
>> >wrote:
>> >>
>> >>> I have the feeling that this discussion isn't over, there's no
>> >>> consensus yet, so I did some tests to get some numbers.
>> >>>
>> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same
>> >>> numbers on every different config with it) on a standalone setup.
>> >>
>> >>
>> >> The write buffer is disabled because otherwise it will get in the way of
>> the
>> >> hbase.regionserver.flushlogentries=1?
>> >>
>> >> It would be interesting to get a baseline for 0.20 which IMO would be
>> >> settings we had in 0.19 w/ write buffer.  Would be good for comparison.
>> >>
>> >> You like the idea of the sync being time-based rather than some number
>> of
>> >> edits?  I can see fellas wanting both.
>> >>
>> >> stack
>> >>
>> >>
>> >> I
>> >>> stopped HBase and deleted the data dir between each run.
>> >>>
>> >>> - hbase.regionserver.flushlogentries=1 and
>> >>> hbase.regionserver.optionallogflushinterval=1000
>> >>>  ran in 354765ms
>> >>>
>> >>> - hbase.regionserver.flushlogentries=100 and
>> >>> hbase.regionserver.optionallogflushinterval=1000
>> >>>  run #1 in 333972ms
>> >>>  run #2 in 331943ms
>> >>>
>> >>> - hbase.regionserver.flushlogentries=1,
>> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush
>> >>> enabled on TestTable
>> >>>  run #1 in 309857ms
>> >>>  run #2 in 311440ms
>> >>>
>> >>> So 100 entries per flush takes ~7% less time, deferred flush takes 14%
>> >>> less.
>> >>>
>> >>> I thereby think that not only should we set flushlogentries=1 in 0.21,
>> >>> but also we should enable deferred log flush by default with a lower
>> >>> optional log flush interval. It will be a nearly as safe but much
>> >>> faster alternative to the previous option. I would even get rid of the
>> >>> hbase.regionserver.flushlogentries config.
>> >>>
>> >>> J-D
>> >>>
>> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <
>> jdcry...@apache.org>
>> >>> wrote:
>> >>> > Well it's even better than that ;) We have optional log flushing
>> which
>> >>> > by default is 10 secs. Make that 100 milliseconds and that's as much
>> >>> > data you can lose. If any other table syncs then this table's edits
>> >>> > are also synced.
>> >>> >
>> >>> > J-D
>> >>> >
>> >>> >
>> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com>
>> >>> wrote:
>> >>> >> Thoughts on a client-facing call to explicit call a WAL sync?  So I
>> >>> could
>> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a
>> batch of
>> >>> >> my inserts, and then run an explicit flush/sync.  The returning of
>> that
>> >>> >> call would guarantee to the client that the data up to that point is
>> >>> safe.
>> >>> >>
>> >>> >> JG
>> >>> >>
>> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
>> >>> >>> I added a new feature for tables called "deferred flush", see
>> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944
>> >>> >>>
>> >>> >>>
>> >>> >>> My opinion is that the default should be paranoid enough to not
>> lose
>> >>> >>> any user data. If we can change a table's attribute without taking
>> it
>> >>> down
>> >>> >>> (there's a jira on that), wouldn't that solve the import problem?
>> >>> >>>
>> >>> >>>
>> >>> >>> For example: have some table that needs to have fast insertion via
>> MR.
>> >>> >>> During the creation of the job, you change the table's
>> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
>> >>> >>> value to false when the job is done.
>> >>> >>>
>> >>> >>> This way you still pass the responsibility to the user but for
>> >>> >>> performance reasons.
>> >>> >>>
>> >>> >>> J-D
>> >>> >>>
>> >>> >>>
>> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cleh...@adobe.com>
>> >>> wrote:
>> >>> >>>
>> >>> >>>> We could have a speedy default and an extra parameter for puts
>> that
>> >>> >>>> would specify a flush is needed. This way you pass the
>> responsibility
>> >>> to
>> >>> >>>> the user and he can decide if he needs to be paranoid or not. This
>> >>> could
>> >>> >>>> be part of Put and even specify granularity of the flush if
>> needed.
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> Cosmin
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org>
>> wrote:
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>> I agree with this.
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> I also think we should leave the default as is with the caveat
>> that
>> >>> >>>>> we call out the durability versus write performance tradeoff in
>> the
>> >>> >>>>> flushlogentries description and up on the wiki somewhere, maybe
>> on
>> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>> >>> >>>>> provide two example configurations, one for performance
>> (reasonable
>> >>> >>>>> tradeoffs), one for paranoia. I put up an issue:
>> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>     - Andy
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> ________________________________
>> >>> >>>>> From: Ryan Rawson <ryano...@gmail.com>
>> >>> >>>>> To: hbase-dev@hadoop.apache.org
>> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM
>> >>> >>>>> Subject: Re: Should we change the default value of
>> >>> >>>>> hbase.regionserver.flushlogentries  for 0.21?
>> >>> >>>>>
>> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync
>> >>> >>>>> every _EDIT_, after all, the previous definition of the word
>> "edit"
>> >>> >>>>> was each KeyValue.  So we could be calling sync for every single
>> >>> >>>>> column in a row. Bad stuff.
>> >>> >>>>>
>> >>> >>>>> In the end, if the regionserver crashes during a batch put, we
>> will
>> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus it
>> >>> makes
>> >>> >>>>>  sense to only do it once and get a massive, massive, speedup.
>> >>> >>>>>
>> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote:
>> >>> >>>>>
>> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every
>> 10
>> >>> >>>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>> >>> >>>>>> we'll now lose 99 or 9 edits max.
>> >>> >>>>>>
>> >>> >>>>>> We need to do some work bringing folks along regardless of what
>> we
>> >>> >>>>>> decide. Flush happens at the end of the put up in the
>> regionserver.
>> >>> >>>>>>  If you are
>> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over
>> on
>> >>> >>>>>> your client -- the puts will only be flushed on the way out
>> after
>> >>> >>>>>> the batch put completes EVEN if you have configured hbase to
>> sync
>> >>> >>>>>> every edit (I ran into this this evening.  J-D sorted me out).
>>  We
>> >>> >>>>>> need to make sure folks are up on this.
>> >>> >>>>>>
>> >>> >>>>>> St.Ack
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>> >>> >>>>>> <jdcry...@apache.org>wrote:
>> >>> >>>>>>
>> >>> >>>>>>
>> >>> >>>>>>> Hi dev!
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and
>> this
>> >>> >>>>>>> gives us the opportunity to review some assumptions. The
>> current
>> >>> >>>>>>> situation:
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's no
>> >>> >>>>>>> data loss. - The user tables edits are flushed every
>> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100.
>> >>> >>>>>>>
>> >>> >>>>>>> Should we now set this value to 1 in order to have more durable
>> >>> >>>>>>> but slower inserts by default? Please speak up.
>> >>> >>>>>>>
>> >>> >>>>>>> Thx,
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> J-D
>> >>> >>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >
>> >>>
>> >>
>> >
>>
>

Reply via email to