That sync at the end of a RPC is my doing. You dont want to sync every _EDIT_, after all, the previous definition of the word "edit" was each KeyValue. So we could be calling sync for every single column in a row. Bad stuff.
In the end, if the regionserver crashes during a batch put, we will never know how much of the batch was flushed to the WAL. Thus it makes sense to only do it once and get a massive, massive, speedup. On Sat, Nov 14, 2009 at 9:45 PM, stack <st...@duboce.net> wrote: > I'm for leaving it as it is, at every 100 edits -- maybe every 10 edits? > Speed stays as it was. We used to lose MBs. By default, we'll now lose 99 > or 9 edits max. > > We need to do some work bringing folks along regardless of what we decide. > Flush happens at the end of the put up in the regionserver. If you are > doing a batch of commits -- e.g. using a big write buffer over on your > client -- the puts will only be flushed on the way out after the batch put > completes EVEN if you have configured hbase to sync every edit (I ran into > this this evening. J-D sorted me out). We need to make sure folks are up > on this. > > St.Ack > > > > On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans > <jdcry...@apache.org>wrote: > >> Hi dev! >> >> Hadoop 0.21 now has a reliable append and flush feature and this gives >> us the opportunity to review some assumptions. The current situation: >> >> - Every edit going to a catalog table is flushed so there's no data loss. >> - The user tables edits are flushed every >> hbase.regionserver.flushlogentries which by default is 100. >> >> Should we now set this value to 1 in order to have more durable but >> slower inserts by default? Please speak up. >> >> Thx, >> >> J-D >> >