On Tue, Jan 12, 2010 at 11:29 AM, Kannan Muthukkaruppan
<kan...@facebook.com> wrote:
>
> For data integrity, going with group commits (batch commits) seems like a 
> good option. My understanding of group commits as implemented in 0.21 is as 
> follows:
>
> *         We wait on acknowledging back to the client until the transaction 
> has been synced to HDFS.

Yes

>
> *         Syncs are batched-a sync is called if the queue has enough 
> transactions  or if a timer expires. (I would imagine that both the # of 
> transactions to batch up as well as timer are configurable knobs already)? In 
> this mode, for the client, the latency increase on writes is upper bounded by 
> the timer setting + the cost of sync itself.

Nope. There is two kinds of group commit around that piece of code:

1) What you called batch commit, which is a configurable value
(flushlogentries) that we have to append x amount of entries to
trigger a sync. Clients don't hold until that syncs happens so a
region server failure could lose some rows depending on the time
between the last sync and the failure.

If flushlogentries=100 and 99 entries are lying around for more than
the timer's timeout (default 1 sec), the timer will force sync those
entries.

2) Group commit happens at high concurrency and is only useful if a
high number of clients are writing at the same time and that
flushlogentries=1. What happens in the LogSyncer thread is that
instead of calling sync() for every entry, we "group" the clients
waiting on the previous sync and issue only 1 sync for all of them. In
this case, when the call returns in the client, we are sure that the
value is in HDFS.

>
>
>
> From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of stack
> Sent: Tuesday, January 12, 2010 10:52 AM
> To: hbase-dev@hadoop.apache.org
> Cc: Kannan Muthukkaruppan; Dhruba Borthakur
> Subject: Re: commit semantics
>
> On Tue, Jan 12, 2010 at 10:14 AM, Dhruba Borthakur 
> <dhr...@gmail.com<mailto:dhr...@gmail.com>> wrote:
> Hi stack,
>
> I was meaning "what if the application inserted the same record into two
> Hbase instances"? Of course, now the onus is on the appl to keep both of
> them in sync and recover from any inconsistencies between them.
>
> Ok.  Like your  "Overlapping Clusters for HA" from 
> http://www.borthakur.com/ftp/hdfs_high_availability.pdf?
>
> I'm not sure how the application could return after writing one cluster 
> without waiting on the second to complete as you suggest above.  It could 
> write in parallel but the second thread might not complete for myriad 
> reasons.  What then?  And as you say, reading, the client would have to make 
> reconciliation.
>
> Isn't there already a 'scalable database' that gives you this headache for 
> free without your having to do work on your part (smile)?
>
> Do you think there a problem syncing on every write (with some batching of 
> writes happening when high-concurrency) or, if that too slow for your needs, 
> adding the holding of clients until sync happens as joydeep suggests?  Will 
> that be sufficient data integrity-wise?
>
> St.Ack
>
> Thanks,
> St.Ack
>

Reply via email to