Re: row level atomicity

Dhruba Borthakur Wed, 03 Mar 2010 18:27:32 -0800

I created  http://issues.apache.org/jira/browse/HBASE-2285 to handle the
case of a partial transaction in the end of the Hlog. Let's discuss a
solution to this problem in the JIRA.


thanks
dhruba


On Wed, Mar 3, 2010 at 2:21 PM, Kannan Muthukkaruppan
<kan...@facebook.com>wrote:

> Spoke to Dhruba on (i), and he suggested doing something at the HBase layer
> such as using markers in the transaction log to clearly delineate start-end
> boundary of a transaction, and using it to ignore partial transactions in
> the log during recovery.
>
> > problem here is that the row locks have to be taken out on all rows
> > before everything else in the case of a Put[] else we aren't atomic.
> > And then I think some checks are ran under HRegion that we would need
>
> An Put[] anyway cannot guarantee atomicity from a client's perspective
> given that subsets of the Put[] could be going to different Region Servers.
> So, is this important to preserve with a single Region Server?
>
> regards,
> Kannan
> -----Original Message-----
> From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of
> Jean-Daniel Cryans
> Sent: Wednesday, March 03, 2010 2:11 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: row level atomicity
>
> > (i)                  It looks like make several calls to
> this.write.append() which in turn does a bunch of individual out.write (to
> the DFSOutputStream), as opposed to just one interaction with the underlying
> DFS. If so, how do we guarantee that all the edits either make it to HDFS or
> not atomically? Or is this just broken?
>
> Yeah I thought about that too but I'm not sure how can we do a single
> DFS operation of any number KVs.
>
>
> > (ii)                The updates to memstore should happen after the sync
> rather than before, correct? Otherwise, there is the danger that the write
> to DFS (sync fails for some reason) & we return an error to the client, but
> we have already taken edits to the memstore. So subsequent reads could serve
> uncommitted data.
>
> Indeed. The syncWal was taken back up in HRS as a way to optimize
> batch Puts but the fact it's called after all the MemStore operations
> is indeed a problem. I think we need to fix both (i) and (ii) by
> ensuring we do only a single append for whatever we have to put and
> then syncWAL once before processing the MemStore. But, the other
> problem here is that the row locks have to be taken out on all rows
> before everything else in the case of a Put[] else we aren't atomic.
> And then I think some checks are ran under HRegion that we would need
> to run before everything else.
>
> Quite a big change but I think it's needed.
>
> J-D
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: row level atomicity

Reply via email to