On Tue, Jan 12, 2010 at 11:29 AM, Kannan Muthukkaruppan <kan...@facebook.com> wrote: > > For data integrity, going with group commits (batch commits) seems like a > good option. My understanding of group commits as implemented in 0.21 is as > follows: > > * We wait on acknowledging back to the client until the transaction > has been synced to HDFS.
Yes > > * Syncs are batched-a sync is called if the queue has enough > transactions or if a timer expires. (I would imagine that both the # of > transactions to batch up as well as timer are configurable knobs already)? In > this mode, for the client, the latency increase on writes is upper bounded by > the timer setting + the cost of sync itself. Nope. There is two kinds of group commit around that piece of code: 1) What you called batch commit, which is a configurable value (flushlogentries) that we have to append x amount of entries to trigger a sync. Clients don't hold until that syncs happens so a region server failure could lose some rows depending on the time between the last sync and the failure. If flushlogentries=100 and 99 entries are lying around for more than the timer's timeout (default 1 sec), the timer will force sync those entries. 2) Group commit happens at high concurrency and is only useful if a high number of clients are writing at the same time and that flushlogentries=1. What happens in the LogSyncer thread is that instead of calling sync() for every entry, we "group" the clients waiting on the previous sync and issue only 1 sync for all of them. In this case, when the call returns in the client, we are sure that the value is in HDFS. > > > > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of stack > Sent: Tuesday, January 12, 2010 10:52 AM > To: hbase-dev@hadoop.apache.org > Cc: Kannan Muthukkaruppan; Dhruba Borthakur > Subject: Re: commit semantics > > On Tue, Jan 12, 2010 at 10:14 AM, Dhruba Borthakur > <dhr...@gmail.com<mailto:dhr...@gmail.com>> wrote: > Hi stack, > > I was meaning "what if the application inserted the same record into two > Hbase instances"? Of course, now the onus is on the appl to keep both of > them in sync and recover from any inconsistencies between them. > > Ok. Like your "Overlapping Clusters for HA" from > http://www.borthakur.com/ftp/hdfs_high_availability.pdf? > > I'm not sure how the application could return after writing one cluster > without waiting on the second to complete as you suggest above. It could > write in parallel but the second thread might not complete for myriad > reasons. What then? And as you say, reading, the client would have to make > reconciliation. > > Isn't there already a 'scalable database' that gives you this headache for > free without your having to do work on your part (smile)? > > Do you think there a problem syncing on every write (with some batching of > writes happening when high-concurrency) or, if that too slow for your needs, > adding the holding of clients until sync happens as joydeep suggests? Will > that be sufficient data integrity-wise? > > St.Ack > > Thanks, > St.Ack >