Yeah, +1 on deferred log flush. Good man J-D. Can we also update performance wiki page to list how to up your write speed at cost of possible increased edit loss?
St.Ack On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ryano...@gmail.com> wrote: > Looks like deferred log flush is the clear winner here, and probably > has a smaller chance of loss than the 100 logflushentries. > > I dare say we should ship with that as the default... > > -ryan > > On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: > > So to satisfy Ryan's thirst of cluster number, here they are: > > > > Default (with write buffer) > > 65 060ms > > > > The rest is without the write buffer (which is so well optimized that > > we only sync once per 2MB batch). I ran it once with entries=1 because > > it's taking so long. > > > > 1 logflushentries > > 2 188 737ms > > > > 100 logflushentries > > 697 590ms > > 698 082ms > > > > deferred log flush > > 545 836ms > > 532 788ms > > > > The cluster is composed of 15 i7s (a bit overkill) but it shows that > > it runs much slower because of network, replication, etc. > > > > Also on another cluster (same hardware) I did some 0.20 testing: > > > > With write buffer: > > 131 811ms > > > > Without: > > 602 842ms > > > > Keep in mind that the sync we call isn't HDFS-265. > > > > J-D > > > > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote: > >> Thanks for picking up this discussion again J-D. > >> > >> See below. > >> > >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > >> > >>> I have the feeling that this discussion isn't over, there's no > >>> consensus yet, so I did some tests to get some numbers. > >>> > >>> PE sequentialWrite 1 with the write buffer disabled (I get the same > >>> numbers on every different config with it) on a standalone setup. > >> > >> > >> The write buffer is disabled because otherwise it will get in the way of > the > >> hbase.regionserver.flushlogentries=1? > >> > >> It would be interesting to get a baseline for 0.20 which IMO would be > >> settings we had in 0.19 w/ write buffer. Would be good for comparison. > >> > >> You like the idea of the sync being time-based rather than some number > of > >> edits? I can see fellas wanting both. > >> > >> stack > >> > >> > >> I > >>> stopped HBase and deleted the data dir between each run. > >>> > >>> - hbase.regionserver.flushlogentries=1 and > >>> hbase.regionserver.optionallogflushinterval=1000 > >>> ran in 354765ms > >>> > >>> - hbase.regionserver.flushlogentries=100 and > >>> hbase.regionserver.optionallogflushinterval=1000 > >>> run #1 in 333972ms > >>> run #2 in 331943ms > >>> > >>> - hbase.regionserver.flushlogentries=1, > >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush > >>> enabled on TestTable > >>> run #1 in 309857ms > >>> run #2 in 311440ms > >>> > >>> So 100 entries per flush takes ~7% less time, deferred flush takes 14% > >>> less. > >>> > >>> I thereby think that not only should we set flushlogentries=1 in 0.21, > >>> but also we should enable deferred log flush by default with a lower > >>> optional log flush interval. It will be a nearly as safe but much > >>> faster alternative to the previous option. I would even get rid of the > >>> hbase.regionserver.flushlogentries config. > >>> > >>> J-D > >>> > >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans < > jdcry...@apache.org> > >>> wrote: > >>> > Well it's even better than that ;) We have optional log flushing > which > >>> > by default is 10 secs. Make that 100 milliseconds and that's as much > >>> > data you can lose. If any other table syncs then this table's edits > >>> > are also synced. > >>> > > >>> > J-D > >>> > > >>> > > >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com> > >>> wrote: > >>> >> Thoughts on a client-facing call to explicit call a WAL sync? So I > >>> could > >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a > batch of > >>> >> my inserts, and then run an explicit flush/sync. The returning of > that > >>> >> call would guarantee to the client that the data up to that point is > >>> safe. > >>> >> > >>> >> JG > >>> >> > >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote: > >>> >>> I added a new feature for tables called "deferred flush", see > >>> >>> https://issues.apache.org/jira/browse/HBASE-1944 > >>> >>> > >>> >>> > >>> >>> My opinion is that the default should be paranoid enough to not > lose > >>> >>> any user data. If we can change a table's attribute without taking > it > >>> down > >>> >>> (there's a jira on that), wouldn't that solve the import problem? > >>> >>> > >>> >>> > >>> >>> For example: have some table that needs to have fast insertion via > MR. > >>> >>> During the creation of the job, you change the table's > >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the > >>> >>> value to false when the job is done. > >>> >>> > >>> >>> This way you still pass the responsibility to the user but for > >>> >>> performance reasons. > >>> >>> > >>> >>> J-D > >>> >>> > >>> >>> > >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cleh...@adobe.com> > >>> wrote: > >>> >>> > >>> >>>> We could have a speedy default and an extra parameter for puts > that > >>> >>>> would specify a flush is needed. This way you pass the > responsibility > >>> to > >>> >>>> the user and he can decide if he needs to be paranoid or not. This > >>> could > >>> >>>> be part of Put and even specify granularity of the flush if > needed. > >>> >>>> > >>> >>>> > >>> >>>> Cosmin > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org> > wrote: > >>> >>>> > >>> >>>> > >>> >>>>> I agree with this. > >>> >>>>> > >>> >>>>> > >>> >>>>> I also think we should leave the default as is with the caveat > that > >>> >>>>> we call out the durability versus write performance tradeoff in > the > >>> >>>>> flushlogentries description and up on the wiki somewhere, maybe > on > >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also > >>> >>>>> provide two example configurations, one for performance > (reasonable > >>> >>>>> tradeoffs), one for paranoia. I put up an issue: > >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984 > >>> >>>>> > >>> >>>>> > >>> >>>>> - Andy > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> ________________________________ > >>> >>>>> From: Ryan Rawson <ryano...@gmail.com> > >>> >>>>> To: hbase-dev@hadoop.apache.org > >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM > >>> >>>>> Subject: Re: Should we change the default value of > >>> >>>>> hbase.regionserver.flushlogentries for 0.21? > >>> >>>>> > >>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync > >>> >>>>> every _EDIT_, after all, the previous definition of the word > "edit" > >>> >>>>> was each KeyValue. So we could be calling sync for every single > >>> >>>>> column in a row. Bad stuff. > >>> >>>>> > >>> >>>>> In the end, if the regionserver crashes during a batch put, we > will > >>> >>>>> never know how much of the batch was flushed to the WAL. Thus it > >>> makes > >>> >>>>> sense to only do it once and get a massive, massive, speedup. > >>> >>>>> > >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote: > >>> >>>>> > >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every > 10 > >>> >>>>>> edits? Speed stays as it was. We used to lose MBs. By default, > >>> >>>>>> we'll now lose 99 or 9 edits max. > >>> >>>>>> > >>> >>>>>> We need to do some work bringing folks along regardless of what > we > >>> >>>>>> decide. Flush happens at the end of the put up in the > regionserver. > >>> >>>>>> If you are > >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over > on > >>> >>>>>> your client -- the puts will only be flushed on the way out > after > >>> >>>>>> the batch put completes EVEN if you have configured hbase to > sync > >>> >>>>>> every edit (I ran into this this evening. J-D sorted me out). > We > >>> >>>>>> need to make sure folks are up on this. > >>> >>>>>> > >>> >>>>>> St.Ack > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans > >>> >>>>>> <jdcry...@apache.org>wrote: > >>> >>>>>> > >>> >>>>>> > >>> >>>>>>> Hi dev! > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and > this > >>> >>>>>>> gives us the opportunity to review some assumptions. The > current > >>> >>>>>>> situation: > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> - Every edit going to a catalog table is flushed so there's no > >>> >>>>>>> data loss. - The user tables edits are flushed every > >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100. > >>> >>>>>>> > >>> >>>>>>> Should we now set this value to 1 in order to have more durable > >>> >>>>>>> but slower inserts by default? Please speak up. > >>> >>>>>>> > >>> >>>>>>> Thx, > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>>> J-D > >>> >>>>>>> > >>> >>>>>>> > >>> >>>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>>> > >>> >>>> > >>> >>>> > >>> >>> > >>> >>> > >>> >> > >>> >> > >>> > > >>> > >> > > >