Sounds good. St.Ack On Fri, Dec 11, 2009 at 5:33 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:
> Ok to make sure I get this right: > > - we enable deferred log flush by default > - we set flushlogentries=1 > > Also since 10 seconds is kind of a huge window I propose that: > > - we set optionalLogFlush=1000 > > which is the MySQL default. We also have to update the wiki (there's > already an entry on deferred log flush) by adding the configuration of > flushlogentries. > > I'll open a jira. > > J-D > > On Fri, Dec 11, 2009 at 5:26 PM, stack <st...@duboce.net> wrote: > > Yeah, +1 on deferred log flush. Good man J-D. > > > > Can we also update performance wiki page to list how to up your write > speed > > at cost of possible increased edit loss? > > > > St.Ack > > > > > > On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ryano...@gmail.com> wrote: > > > >> Looks like deferred log flush is the clear winner here, and probably > >> has a smaller chance of loss than the 100 logflushentries. > >> > >> I dare say we should ship with that as the default... > >> > >> -ryan > >> > >> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans < > jdcry...@apache.org> > >> wrote: > >> > So to satisfy Ryan's thirst of cluster number, here they are: > >> > > >> > Default (with write buffer) > >> > 65 060ms > >> > > >> > The rest is without the write buffer (which is so well optimized that > >> > we only sync once per 2MB batch). I ran it once with entries=1 because > >> > it's taking so long. > >> > > >> > 1 logflushentries > >> > 2 188 737ms > >> > > >> > 100 logflushentries > >> > 697 590ms > >> > 698 082ms > >> > > >> > deferred log flush > >> > 545 836ms > >> > 532 788ms > >> > > >> > The cluster is composed of 15 i7s (a bit overkill) but it shows that > >> > it runs much slower because of network, replication, etc. > >> > > >> > Also on another cluster (same hardware) I did some 0.20 testing: > >> > > >> > With write buffer: > >> > 131 811ms > >> > > >> > Without: > >> > 602 842ms > >> > > >> > Keep in mind that the sync we call isn't HDFS-265. > >> > > >> > J-D > >> > > >> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote: > >> >> Thanks for picking up this discussion again J-D. > >> >> > >> >> See below. > >> >> > >> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans < > jdcry...@apache.org > >> >wrote: > >> >> > >> >>> I have the feeling that this discussion isn't over, there's no > >> >>> consensus yet, so I did some tests to get some numbers. > >> >>> > >> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same > >> >>> numbers on every different config with it) on a standalone setup. > >> >> > >> >> > >> >> The write buffer is disabled because otherwise it will get in the way > of > >> the > >> >> hbase.regionserver.flushlogentries=1? > >> >> > >> >> It would be interesting to get a baseline for 0.20 which IMO would be > >> >> settings we had in 0.19 w/ write buffer. Would be good for > comparison. > >> >> > >> >> You like the idea of the sync being time-based rather than some > number > >> of > >> >> edits? I can see fellas wanting both. > >> >> > >> >> stack > >> >> > >> >> > >> >> I > >> >>> stopped HBase and deleted the data dir between each run. > >> >>> > >> >>> - hbase.regionserver.flushlogentries=1 and > >> >>> hbase.regionserver.optionallogflushinterval=1000 > >> >>> ran in 354765ms > >> >>> > >> >>> - hbase.regionserver.flushlogentries=100 and > >> >>> hbase.regionserver.optionallogflushinterval=1000 > >> >>> run #1 in 333972ms > >> >>> run #2 in 331943ms > >> >>> > >> >>> - hbase.regionserver.flushlogentries=1, > >> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush > >> >>> enabled on TestTable > >> >>> run #1 in 309857ms > >> >>> run #2 in 311440ms > >> >>> > >> >>> So 100 entries per flush takes ~7% less time, deferred flush takes > 14% > >> >>> less. > >> >>> > >> >>> I thereby think that not only should we set flushlogentries=1 in > 0.21, > >> >>> but also we should enable deferred log flush by default with a lower > >> >>> optional log flush interval. It will be a nearly as safe but much > >> >>> faster alternative to the previous option. I would even get rid of > the > >> >>> hbase.regionserver.flushlogentries config. > >> >>> > >> >>> J-D > >> >>> > >> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans < > >> jdcry...@apache.org> > >> >>> wrote: > >> >>> > Well it's even better than that ;) We have optional log flushing > >> which > >> >>> > by default is 10 secs. Make that 100 milliseconds and that's as > much > >> >>> > data you can lose. If any other table syncs then this table's > edits > >> >>> > are also synced. > >> >>> > > >> >>> > J-D > >> >>> > > >> >>> > > >> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com > > > >> >>> wrote: > >> >>> >> Thoughts on a client-facing call to explicit call a WAL sync? So > I > >> >>> could > >> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a > >> batch of > >> >>> >> my inserts, and then run an explicit flush/sync. The returning > of > >> that > >> >>> >> call would guarantee to the client that the data up to that point > is > >> >>> safe. > >> >>> >> > >> >>> >> JG > >> >>> >> > >> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote: > >> >>> >>> I added a new feature for tables called "deferred flush", see > >> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944 > >> >>> >>> > >> >>> >>> > >> >>> >>> My opinion is that the default should be paranoid enough to not > >> lose > >> >>> >>> any user data. If we can change a table's attribute without > taking > >> it > >> >>> down > >> >>> >>> (there's a jira on that), wouldn't that solve the import > problem? > >> >>> >>> > >> >>> >>> > >> >>> >>> For example: have some table that needs to have fast insertion > via > >> MR. > >> >>> >>> During the creation of the job, you change the table's > >> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set > the > >> >>> >>> value to false when the job is done. > >> >>> >>> > >> >>> >>> This way you still pass the responsibility to the user but for > >> >>> >>> performance reasons. > >> >>> >>> > >> >>> >>> J-D > >> >>> >>> > >> >>> >>> > >> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene < > cleh...@adobe.com> > >> >>> wrote: > >> >>> >>> > >> >>> >>>> We could have a speedy default and an extra parameter for puts > >> that > >> >>> >>>> would specify a flush is needed. This way you pass the > >> responsibility > >> >>> to > >> >>> >>>> the user and he can decide if he needs to be paranoid or not. > This > >> >>> could > >> >>> >>>> be part of Put and even specify granularity of the flush if > >> needed. > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> Cosmin > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> > >> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org> > >> wrote: > >> >>> >>>> > >> >>> >>>> > >> >>> >>>>> I agree with this. > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> I also think we should leave the default as is with the caveat > >> that > >> >>> >>>>> we call out the durability versus write performance tradeoff > in > >> the > >> >>> >>>>> flushlogentries description and up on the wiki somewhere, > maybe > >> on > >> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could > also > >> >>> >>>>> provide two example configurations, one for performance > >> (reasonable > >> >>> >>>>> tradeoffs), one for paranoia. I put up an issue: > >> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984 > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> - Andy > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> ________________________________ > >> >>> >>>>> From: Ryan Rawson <ryano...@gmail.com> > >> >>> >>>>> To: hbase-dev@hadoop.apache.org > >> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM > >> >>> >>>>> Subject: Re: Should we change the default value of > >> >>> >>>>> hbase.regionserver.flushlogentries for 0.21? > >> >>> >>>>> > >> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to > sync > >> >>> >>>>> every _EDIT_, after all, the previous definition of the word > >> "edit" > >> >>> >>>>> was each KeyValue. So we could be calling sync for every > single > >> >>> >>>>> column in a row. Bad stuff. > >> >>> >>>>> > >> >>> >>>>> In the end, if the regionserver crashes during a batch put, we > >> will > >> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus > it > >> >>> makes > >> >>> >>>>> sense to only do it once and get a massive, massive, speedup. > >> >>> >>>>> > >> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> > wrote: > >> >>> >>>>> > >> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe > every > >> 10 > >> >>> >>>>>> edits? Speed stays as it was. We used to lose MBs. By > default, > >> >>> >>>>>> we'll now lose 99 or 9 edits max. > >> >>> >>>>>> > >> >>> >>>>>> We need to do some work bringing folks along regardless of > what > >> we > >> >>> >>>>>> decide. Flush happens at the end of the put up in the > >> regionserver. > >> >>> >>>>>> If you are > >> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer > over > >> on > >> >>> >>>>>> your client -- the puts will only be flushed on the way out > >> after > >> >>> >>>>>> the batch put completes EVEN if you have configured hbase to > >> sync > >> >>> >>>>>> every edit (I ran into this this evening. J-D sorted me > out). > >> We > >> >>> >>>>>> need to make sure folks are up on this. > >> >>> >>>>>> > >> >>> >>>>>> St.Ack > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans > >> >>> >>>>>> <jdcry...@apache.org>wrote: > >> >>> >>>>>> > >> >>> >>>>>> > >> >>> >>>>>>> Hi dev! > >> >>> >>>>>>> > >> >>> >>>>>>> > >> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and > >> this > >> >>> >>>>>>> gives us the opportunity to review some assumptions. The > >> current > >> >>> >>>>>>> situation: > >> >>> >>>>>>> > >> >>> >>>>>>> > >> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's > no > >> >>> >>>>>>> data loss. - The user tables edits are flushed every > >> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100. > >> >>> >>>>>>> > >> >>> >>>>>>> Should we now set this value to 1 in order to have more > durable > >> >>> >>>>>>> but slower inserts by default? Please speak up. > >> >>> >>>>>>> > >> >>> >>>>>>> Thx, > >> >>> >>>>>>> > >> >>> >>>>>>> > >> >>> >>>>>>> J-D > >> >>> >>>>>>> > >> >>> >>>>>>> > >> >>> >>>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>>> > >> >>> >>>> > >> >>> >>>> > >> >>> >>> > >> >>> >>> > >> >>> >> > >> >>> >> > >> >>> > > >> >>> > >> >> > >> > > >> > > >