Looks like deferred log flush is the clear winner here, and probably has a smaller chance of loss than the 100 logflushentries.
I dare say we should ship with that as the default... -ryan On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote: > So to satisfy Ryan's thirst of cluster number, here they are: > > Default (with write buffer) > 65 060ms > > The rest is without the write buffer (which is so well optimized that > we only sync once per 2MB batch). I ran it once with entries=1 because > it's taking so long. > > 1 logflushentries > 2 188 737ms > > 100 logflushentries > 697 590ms > 698 082ms > > deferred log flush > 545 836ms > 532 788ms > > The cluster is composed of 15 i7s (a bit overkill) but it shows that > it runs much slower because of network, replication, etc. > > Also on another cluster (same hardware) I did some 0.20 testing: > > With write buffer: > 131 811ms > > Without: > 602 842ms > > Keep in mind that the sync we call isn't HDFS-265. > > J-D > > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote: >> Thanks for picking up this discussion again J-D. >> >> See below. >> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans >> <jdcry...@apache.org>wrote: >> >>> I have the feeling that this discussion isn't over, there's no >>> consensus yet, so I did some tests to get some numbers. >>> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same >>> numbers on every different config with it) on a standalone setup. >> >> >> The write buffer is disabled because otherwise it will get in the way of the >> hbase.regionserver.flushlogentries=1? >> >> It would be interesting to get a baseline for 0.20 which IMO would be >> settings we had in 0.19 w/ write buffer. Would be good for comparison. >> >> You like the idea of the sync being time-based rather than some number of >> edits? I can see fellas wanting both. >> >> stack >> >> >> I >>> stopped HBase and deleted the data dir between each run. >>> >>> - hbase.regionserver.flushlogentries=1 and >>> hbase.regionserver.optionallogflushinterval=1000 >>> ran in 354765ms >>> >>> - hbase.regionserver.flushlogentries=100 and >>> hbase.regionserver.optionallogflushinterval=1000 >>> run #1 in 333972ms >>> run #2 in 331943ms >>> >>> - hbase.regionserver.flushlogentries=1, >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush >>> enabled on TestTable >>> run #1 in 309857ms >>> run #2 in 311440ms >>> >>> So 100 entries per flush takes ~7% less time, deferred flush takes 14% >>> less. >>> >>> I thereby think that not only should we set flushlogentries=1 in 0.21, >>> but also we should enable deferred log flush by default with a lower >>> optional log flush interval. It will be a nearly as safe but much >>> faster alternative to the previous option. I would even get rid of the >>> hbase.regionserver.flushlogentries config. >>> >>> J-D >>> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans <jdcry...@apache.org> >>> wrote: >>> > Well it's even better than that ;) We have optional log flushing which >>> > by default is 10 secs. Make that 100 milliseconds and that's as much >>> > data you can lose. If any other table syncs then this table's edits >>> > are also synced. >>> > >>> > J-D >>> > >>> > >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com> >>> wrote: >>> >> Thoughts on a client-facing call to explicit call a WAL sync? So I >>> could >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of >>> >> my inserts, and then run an explicit flush/sync. The returning of that >>> >> call would guarantee to the client that the data up to that point is >>> safe. >>> >> >>> >> JG >>> >> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote: >>> >>> I added a new feature for tables called "deferred flush", see >>> >>> https://issues.apache.org/jira/browse/HBASE-1944 >>> >>> >>> >>> >>> >>> My opinion is that the default should be paranoid enough to not lose >>> >>> any user data. If we can change a table's attribute without taking it >>> down >>> >>> (there's a jira on that), wouldn't that solve the import problem? >>> >>> >>> >>> >>> >>> For example: have some table that needs to have fast insertion via MR. >>> >>> During the creation of the job, you change the table's >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the >>> >>> value to false when the job is done. >>> >>> >>> >>> This way you still pass the responsibility to the user but for >>> >>> performance reasons. >>> >>> >>> >>> J-D >>> >>> >>> >>> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cleh...@adobe.com> >>> wrote: >>> >>> >>> >>>> We could have a speedy default and an extra parameter for puts that >>> >>>> would specify a flush is needed. This way you pass the responsibility >>> to >>> >>>> the user and he can decide if he needs to be paranoid or not. This >>> could >>> >>>> be part of Put and even specify granularity of the flush if needed. >>> >>>> >>> >>>> >>> >>>> Cosmin >>> >>>> >>> >>>> >>> >>>> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org> wrote: >>> >>>> >>> >>>> >>> >>>>> I agree with this. >>> >>>>> >>> >>>>> >>> >>>>> I also think we should leave the default as is with the caveat that >>> >>>>> we call out the durability versus write performance tradeoff in the >>> >>>>> flushlogentries description and up on the wiki somewhere, maybe on >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also >>> >>>>> provide two example configurations, one for performance (reasonable >>> >>>>> tradeoffs), one for paranoia. I put up an issue: >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984 >>> >>>>> >>> >>>>> >>> >>>>> - Andy >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> ________________________________ >>> >>>>> From: Ryan Rawson <ryano...@gmail.com> >>> >>>>> To: hbase-dev@hadoop.apache.org >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM >>> >>>>> Subject: Re: Should we change the default value of >>> >>>>> hbase.regionserver.flushlogentries for 0.21? >>> >>>>> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync >>> >>>>> every _EDIT_, after all, the previous definition of the word "edit" >>> >>>>> was each KeyValue. So we could be calling sync for every single >>> >>>>> column in a row. Bad stuff. >>> >>>>> >>> >>>>> In the end, if the regionserver crashes during a batch put, we will >>> >>>>> never know how much of the batch was flushed to the WAL. Thus it >>> makes >>> >>>>> sense to only do it once and get a massive, massive, speedup. >>> >>>>> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote: >>> >>>>> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10 >>> >>>>>> edits? Speed stays as it was. We used to lose MBs. By default, >>> >>>>>> we'll now lose 99 or 9 edits max. >>> >>>>>> >>> >>>>>> We need to do some work bringing folks along regardless of what we >>> >>>>>> decide. Flush happens at the end of the put up in the regionserver. >>> >>>>>> If you are >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over on >>> >>>>>> your client -- the puts will only be flushed on the way out after >>> >>>>>> the batch put completes EVEN if you have configured hbase to sync >>> >>>>>> every edit (I ran into this this evening. J-D sorted me out). We >>> >>>>>> need to make sure folks are up on this. >>> >>>>>> >>> >>>>>> St.Ack >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans >>> >>>>>> <jdcry...@apache.org>wrote: >>> >>>>>> >>> >>>>>> >>> >>>>>>> Hi dev! >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and this >>> >>>>>>> gives us the opportunity to review some assumptions. The current >>> >>>>>>> situation: >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's no >>> >>>>>>> data loss. - The user tables edits are flushed every >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100. >>> >>>>>>> >>> >>>>>>> Should we now set this value to 1 in order to have more durable >>> >>>>>>> but slower inserts by default? Please speak up. >>> >>>>>>> >>> >>>>>>> Thx, >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> J-D >>> >>>>>>> >>> >>>>>>> >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >> >>> >> >>> > >>> >> >