On Wed, Mar 3, 2010 at 2:11 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
>> (i)                  It looks like make several calls to this.write.append() 
>> which in turn does a bunch of individual out.write (to the DFSOutputStream), 
>> as opposed to just one interaction with the underlying DFS. If so, how do we 
>> guarantee that all the edits either make it to HDFS or not atomically? Or is 
>> this just broken?
>
> Yeah I thought about that too but I'm not sure how can we do a single
> DFS operation of any number KVs.

Do we really need a single actual DFS atomic write operation?  If we
had some kind of end-of-row marker, would that help instead?

>
>
>> (ii)                The updates to memstore should happen after the sync 
>> rather than before, correct? Otherwise, there is the danger that the write 
>> to DFS (sync fails for some reason) & we return an error to the client, but 
>> we have already taken edits to the memstore. So subsequent reads could serve 
>> uncommitted data.
>
> Indeed. The syncWal was taken back up in HRS as a way to optimize
> batch Puts but the fact it's called after all the MemStore operations
> is indeed a problem. I think we need to fix both (i) and (ii) by
> ensuring we do only a single append for whatever we have to put and
> then syncWAL once before processing the MemStore. But, the other
> problem here is that the row locks have to be taken out on all rows
> before everything else in the case of a Put[] else we aren't atomic.
> And then I think some checks are ran under HRegion that we would need
> to run before everything else.

I have a patch to make atomic changes to memstore without requiring
real _os_ level atomic anything.  So we can cover that case without
locks going forward.

But as you said, what happens if hlog append fails?  The obvious thing
would be to remove the additions from the memstore.  But how to
accomplish this easily?

If each logical insert into memstore had a different 'version number'
it might be easier, hey?



>
> Quite a big change but I think it's needed.
>
> J-D
>

Reply via email to