Re: memstore timestamp and visible timestamp

Wei Tan Fri, 03 Aug 2012 12:44:13 -0700

Hi Lars,

   "Since the region server also hands out the TSs based on wall clock 
time (and assuming time does not go backwards) it follows that a KV 
assigned a later memTS cannot have an earlier TS."


   I assume that this applies ONLY when we talk about two KVs in the SAME 
row? I read the code of put() finding that a row is locked entering a put, 
and then TS assigned, and later memTS assigned. This makes sense since 
only after this put is done can another put obtain the row lock, and 
therefore a larger TS and memTS will be obtained. 

  However, this does NOT hold for two KVs who belong to different rows, 
right? Say we have two KVs,  KV1 can enter the put earlier and get a 
smaller TS1, but it can be delayed a little bit in the code path, and 
possibly get a memTS after KV2, correct?

  Again, thanks :-)

Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
[email protected]; 914-784-6752



From:   lars hofhansl <[email protected]>
To:     "[email protected]" <[email protected]>, 
Cc:     "[email protected]" <[email protected]>
Date:   08/03/2012 03:14 PM
Subject:        Re: memstore timestamp and visible timestamp



I see. This is not as much a stated guarantee but a fact following from 
the implementation.


The memTS is handed out per region server - which is fine, because the 
only consistency guarantee HBase makes is for KVs of the same row,
and these are always colocated in the same region (and hence the same 
region server).
Since the region server also hands out the TSs based on wall clock time 
(and assuming time does not go backwards) it follows that a KV assigned a 
later memTS cannot have an earlier TS.

Of course that is not the case if you use client assigned TSs.

Maybe I should write a followup blog post that more clearly describes the 
relationship (or rather the absence thereof) between the memTS and the TS.


The gist is that the memTS is strictly internal to guarantee ACID 
properties (and HBase could have used readlocks for this as well, and if 
it did that would be transparent to the outside),
whereas the TS is an application level concept, it is part of the data (so 
to speak).


-- Lars
________________________________
From: Wei Tan <[email protected]>
To: [email protected] 
Cc: "[email protected]" <[email protected]> 
Sent: Friday, August 3, 2012 7:21 AM
Subject: Re: memstore timestamp and visible timestamp

Hi Lars,

   Appreciate your reply. Actually I read your blog posting and then had 
that question. I am very interested in how you guarantee this:

   Also note that if you use the Region Server assigned TSs then mTS1<mTS2 

implies TS1<=TS2 (the update might happen with the same ms).

  In case you have a pointer explaining this, I would like to read. 
Otherwise I will dig into the code later today. I remember reading 0.92.0 
code and do not find much clue. But I will try again.



Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
[email protected]; 914-784-6752



From:   lars hofhansl <[email protected]>
To:     "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, 
Date:   08/02/2012 07:35 PM
Subject:        Re: memstore timestamp and visible timestamp



Hi Wei,

you have to distinguish between "visible to other concurrent scanners" and 

"visible to a client".
What's visible to a client is determined by what the a client wants to see 

based on the application visible timestamp (TS).

The visibility to concurrent scanners is controlled by the memstoreTS 
(mTS) to avoid "strange" states sue to parallel updates.
HBase here guards against partially visible "transactions" (i.e. a Put of 
many columns that fails after it applied the changes to some of the 
columns).

The scenario you describe below is indeed desired. Note that a client can 
request seeing the older versions too so the older edit (in terms of TS is 

not lost).
Also note that if you use the Region Server assigned TSs then mTS1<mTS2 
implies TS1<=TS2 (the update might happen with the same ms).

If you do not mind a longer read, I have written about this here: 
http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html

Let me know if that makes any sense.

-- Lars


----- Original Message -----
From: Wei Tan <[email protected]>
To: [email protected]
Cc: 
Sent: Thursday, August 2, 2012 3:35 PM
Subject: memstore timestamp and visible timestamp

Hi,

  I have a question regarding the correlation between the visible 
timestamp of a KV (denoted as ts) and its memstore timestamp (aka, the 
write number, denoted as memts). Reading the HRegion.java code it seems 
that these two are independently assigned. Let's assume two concurrent 
put: (k, v1) and (k, v2)


  Suppose somehow memts(k,v1) < memts(k, v2) then (k,v1) will be committed 


and visible before (k,v2). 
If ts(k,v1) < ts(k, v2), then after both KVs commits, (k,v2) becomes the 
latest version.
else, if ts(k,v1) > ts(k, v2), then after a "later"(w.r.t. MVCC) KV 
commits, it immediately become stale and still not visible. --- Is it a 
desirable feature?


  Am I understanding it correctly, that memts(k,v1) < memts(k, v2) does 
not indicate that ts(k,v1) < ts(k, v2), and vice versa? 
PS: let's talk about the hbase region server assigned, not user assigned, 
visible timestamp.

  Thanks,

Wei

Best Regards,
Wei

Wei Tan 
Research Staff Member 
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
[email protected]; 914-784-6752

Re: memstore timestamp and visible timestamp

Reply via email to