Ok to make sure I get this right: - we enable deferred log flush by default - we set flushlogentries=1
Also since 10 seconds is kind of a huge window I propose that: - we set optionalLogFlush=1000 which is the MySQL default. We also have to update the wiki (there's already an entry on deferred log flush) by adding the configuration of flushlogentries. I'll open a jira. J-D On Fri, Dec 11, 2009 at 5:26 PM, stack <st...@duboce.net> wrote: > Yeah, +1 on deferred log flush. Good man J-D. > > Can we also update performance wiki page to list how to up your write speed > at cost of possible increased edit loss? > > St.Ack > > > On Fri, Dec 11, 2009 at 1:35 PM, Ryan Rawson <ryano...@gmail.com> wrote: > >> Looks like deferred log flush is the clear winner here, and probably >> has a smaller chance of loss than the 100 logflushentries. >> >> I dare say we should ship with that as the default... >> >> -ryan >> >> On Thu, Dec 10, 2009 at 6:02 PM, Jean-Daniel Cryans <jdcry...@apache.org> >> wrote: >> > So to satisfy Ryan's thirst of cluster number, here they are: >> > >> > Default (with write buffer) >> > 65 060ms >> > >> > The rest is without the write buffer (which is so well optimized that >> > we only sync once per 2MB batch). I ran it once with entries=1 because >> > it's taking so long. >> > >> > 1 logflushentries >> > 2 188 737ms >> > >> > 100 logflushentries >> > 697 590ms >> > 698 082ms >> > >> > deferred log flush >> > 545 836ms >> > 532 788ms >> > >> > The cluster is composed of 15 i7s (a bit overkill) but it shows that >> > it runs much slower because of network, replication, etc. >> > >> > Also on another cluster (same hardware) I did some 0.20 testing: >> > >> > With write buffer: >> > 131 811ms >> > >> > Without: >> > 602 842ms >> > >> > Keep in mind that the sync we call isn't HDFS-265. >> > >> > J-D >> > >> > On Thu, Dec 3, 2009 at 9:53 PM, stack <st...@duboce.net> wrote: >> >> Thanks for picking up this discussion again J-D. >> >> >> >> See below. >> >> >> >> On Thu, Dec 3, 2009 at 3:24 PM, Jean-Daniel Cryans <jdcry...@apache.org >> >wrote: >> >> >> >>> I have the feeling that this discussion isn't over, there's no >> >>> consensus yet, so I did some tests to get some numbers. >> >>> >> >>> PE sequentialWrite 1 with the write buffer disabled (I get the same >> >>> numbers on every different config with it) on a standalone setup. >> >> >> >> >> >> The write buffer is disabled because otherwise it will get in the way of >> the >> >> hbase.regionserver.flushlogentries=1? >> >> >> >> It would be interesting to get a baseline for 0.20 which IMO would be >> >> settings we had in 0.19 w/ write buffer. Would be good for comparison. >> >> >> >> You like the idea of the sync being time-based rather than some number >> of >> >> edits? I can see fellas wanting both. >> >> >> >> stack >> >> >> >> >> >> I >> >>> stopped HBase and deleted the data dir between each run. >> >>> >> >>> - hbase.regionserver.flushlogentries=1 and >> >>> hbase.regionserver.optionallogflushinterval=1000 >> >>> ran in 354765ms >> >>> >> >>> - hbase.regionserver.flushlogentries=100 and >> >>> hbase.regionserver.optionallogflushinterval=1000 >> >>> run #1 in 333972ms >> >>> run #2 in 331943ms >> >>> >> >>> - hbase.regionserver.flushlogentries=1, >> >>> hbase.regionserver.optionallogflushinterval=1000 and deferred flush >> >>> enabled on TestTable >> >>> run #1 in 309857ms >> >>> run #2 in 311440ms >> >>> >> >>> So 100 entries per flush takes ~7% less time, deferred flush takes 14% >> >>> less. >> >>> >> >>> I thereby think that not only should we set flushlogentries=1 in 0.21, >> >>> but also we should enable deferred log flush by default with a lower >> >>> optional log flush interval. It will be a nearly as safe but much >> >>> faster alternative to the previous option. I would even get rid of the >> >>> hbase.regionserver.flushlogentries config. >> >>> >> >>> J-D >> >>> >> >>> On Tue, Nov 17, 2009 at 7:10 PM, Jean-Daniel Cryans < >> jdcry...@apache.org> >> >>> wrote: >> >>> > Well it's even better than that ;) We have optional log flushing >> which >> >>> > by default is 10 secs. Make that 100 milliseconds and that's as much >> >>> > data you can lose. If any other table syncs then this table's edits >> >>> > are also synced. >> >>> > >> >>> > J-D >> >>> > >> >>> > >> >>> > On Tue, Nov 17, 2009 at 4:36 PM, Jonathan Gray <jl...@streamy.com> >> >>> wrote: >> >>> >> Thoughts on a client-facing call to explicit call a WAL sync? So I >> >>> could >> >>> >> turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a >> batch of >> >>> >> my inserts, and then run an explicit flush/sync. The returning of >> that >> >>> >> call would guarantee to the client that the data up to that point is >> >>> safe. >> >>> >> >> >>> >> JG >> >>> >> >> >>> >> On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote: >> >>> >>> I added a new feature for tables called "deferred flush", see >> >>> >>> https://issues.apache.org/jira/browse/HBASE-1944 >> >>> >>> >> >>> >>> >> >>> >>> My opinion is that the default should be paranoid enough to not >> lose >> >>> >>> any user data. If we can change a table's attribute without taking >> it >> >>> down >> >>> >>> (there's a jira on that), wouldn't that solve the import problem? >> >>> >>> >> >>> >>> >> >>> >>> For example: have some table that needs to have fast insertion via >> MR. >> >>> >>> During the creation of the job, you change the table's >> >>> >>> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the >> >>> >>> value to false when the job is done. >> >>> >>> >> >>> >>> This way you still pass the responsibility to the user but for >> >>> >>> performance reasons. >> >>> >>> >> >>> >>> J-D >> >>> >>> >> >>> >>> >> >>> >>> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cleh...@adobe.com> >> >>> wrote: >> >>> >>> >> >>> >>>> We could have a speedy default and an extra parameter for puts >> that >> >>> >>>> would specify a flush is needed. This way you pass the >> responsibility >> >>> to >> >>> >>>> the user and he can decide if he needs to be paranoid or not. This >> >>> could >> >>> >>>> be part of Put and even specify granularity of the flush if >> needed. >> >>> >>>> >> >>> >>>> >> >>> >>>> Cosmin >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org> >> wrote: >> >>> >>>> >> >>> >>>> >> >>> >>>>> I agree with this. >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> I also think we should leave the default as is with the caveat >> that >> >>> >>>>> we call out the durability versus write performance tradeoff in >> the >> >>> >>>>> flushlogentries description and up on the wiki somewhere, maybe >> on >> >>> >>>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also >> >>> >>>>> provide two example configurations, one for performance >> (reasonable >> >>> >>>>> tradeoffs), one for paranoia. I put up an issue: >> >>> >>>>> https://issues.apache.org/jira/browse/HBASE-1984 >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> - Andy >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> ________________________________ >> >>> >>>>> From: Ryan Rawson <ryano...@gmail.com> >> >>> >>>>> To: hbase-dev@hadoop.apache.org >> >>> >>>>> Sent: Sat, November 14, 2009 11:22:13 PM >> >>> >>>>> Subject: Re: Should we change the default value of >> >>> >>>>> hbase.regionserver.flushlogentries for 0.21? >> >>> >>>>> >> >>> >>>>> That sync at the end of a RPC is my doing. You dont want to sync >> >>> >>>>> every _EDIT_, after all, the previous definition of the word >> "edit" >> >>> >>>>> was each KeyValue. So we could be calling sync for every single >> >>> >>>>> column in a row. Bad stuff. >> >>> >>>>> >> >>> >>>>> In the end, if the regionserver crashes during a batch put, we >> will >> >>> >>>>> never know how much of the batch was flushed to the WAL. Thus it >> >>> makes >> >>> >>>>> sense to only do it once and get a massive, massive, speedup. >> >>> >>>>> >> >>> >>>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote: >> >>> >>>>> >> >>> >>>>>> I'm for leaving it as it is, at every 100 edits -- maybe every >> 10 >> >>> >>>>>> edits? Speed stays as it was. We used to lose MBs. By default, >> >>> >>>>>> we'll now lose 99 or 9 edits max. >> >>> >>>>>> >> >>> >>>>>> We need to do some work bringing folks along regardless of what >> we >> >>> >>>>>> decide. Flush happens at the end of the put up in the >> regionserver. >> >>> >>>>>> If you are >> >>> >>>>>> doing a batch of commits -- e.g. using a big write buffer over >> on >> >>> >>>>>> your client -- the puts will only be flushed on the way out >> after >> >>> >>>>>> the batch put completes EVEN if you have configured hbase to >> sync >> >>> >>>>>> every edit (I ran into this this evening. J-D sorted me out). >> We >> >>> >>>>>> need to make sure folks are up on this. >> >>> >>>>>> >> >>> >>>>>> St.Ack >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans >> >>> >>>>>> <jdcry...@apache.org>wrote: >> >>> >>>>>> >> >>> >>>>>> >> >>> >>>>>>> Hi dev! >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> Hadoop 0.21 now has a reliable append and flush feature and >> this >> >>> >>>>>>> gives us the opportunity to review some assumptions. The >> current >> >>> >>>>>>> situation: >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> - Every edit going to a catalog table is flushed so there's no >> >>> >>>>>>> data loss. - The user tables edits are flushed every >> >>> >>>>>>> hbase.regionserver.flushlogentries which by default is 100. >> >>> >>>>>>> >> >>> >>>>>>> Should we now set this value to 1 in order to have more durable >> >>> >>>>>>> but slower inserts by default? Please speak up. >> >>> >>>>>>> >> >>> >>>>>>> Thx, >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> J-D >> >>> >>>>>>> >> >>> >>>>>>> >> >>> >>>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>> >> >>> >>>> >> >>> >>> >> >>> >>> >> >>> >> >> >>> >> >> >>> > >> >>> >> >> >> > >> >