On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <cleh...@adobe.com> wrote:
> We could have a speedy default and an extra parameter for puts that would > specify a flush is needed. This way you pass the responsibility to the user > and he can decide if he needs to be paranoid or not. This could be part of > Put and even specify granularity of the flush if needed. > > I like this idea. St.Ack > Cosmin > > > On 11/15/09 6:59 PM, "Andrew Purtell" <apurt...@apache.org> wrote: > > > I agree with this. > > > > I also think we should leave the default as is with the caveat that we > call > > out the durability versus write performance tradeoff in the > flushlogentries > > description and up on the wiki somewhere, maybe on > > http://wiki.apache.org/hadoop/PerformanceTuning . We could also provide > two > > example configurations, one for performance (reasonable tradeoffs), one > for > > paranoia. I put up an issue: > https://issues.apache.org/jira/browse/HBASE-1984 > > > > - Andy > > > > > > > > > > ________________________________ > > From: Ryan Rawson <ryano...@gmail.com> > > To: hbase-dev@hadoop.apache.org > > Sent: Sat, November 14, 2009 11:22:13 PM > > Subject: Re: Should we change the default value of > > hbase.regionserver.flushlogentries for 0.21? > > > > That sync at the end of a RPC is my doing. You dont want to sync every > > _EDIT_, after all, the previous definition of the word "edit" was each > > KeyValue. So we could be calling sync for every single column in a > > row. Bad stuff. > > > > In the end, if the regionserver crashes during a batch put, we will > > never know how much of the batch was flushed to the WAL. Thus it makes > > sense to only do it once and get a massive, massive, speedup. > > > > On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote: > >> I'm for leaving it as it is, at every 100 edits -- maybe every 10 edits? > >> Speed stays as it was. We used to lose MBs. By default, we'll now lose > 99 > >> or 9 edits max. > >> > >> We need to do some work bringing folks along regardless of what we > decide. > >> Flush happens at the end of the put up in the regionserver. If you are > >> doing a batch of commits -- e.g. using a big write buffer over on your > >> client -- the puts will only be flushed on the way out after the batch > put > >> completes EVEN if you have configured hbase to sync every edit (I ran > into > >> this this evening. J-D sorted me out). We need to make sure folks are > up > >> on this. > >> > >> St.Ack > >> > >> > >> > >> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans > >> <jdcry...@apache.org>wrote: > >> > >>> Hi dev! > >>> > >>> Hadoop 0.21 now has a reliable append and flush feature and this gives > >>> us the opportunity to review some assumptions. The current situation: > >>> > >>> - Every edit going to a catalog table is flushed so there's no data > loss. > >>> - The user tables edits are flushed every > >>> hbase.regionserver.flushlogentries which by default is 100. > >>> > >>> Should we now set this value to 1 in order to have more durable but > >>> slower inserts by default? Please speak up. > >>> > >>> Thx, > >>> > >>> J-D > >>> > >> > > > > > > > > > >