[ 
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094426#comment-16094426
 ] 

stack commented on HBASE-15968:
-------------------------------

You don't think this could make 2.0.0 [~yangzhe1991] ?

> New behavior of versions considering mvcc and ts rather than ts only
> --------------------------------------------------------------------
>
>                 Key: HBASE-15968
>                 URL: https://issues.apache.org/jira/browse/HBASE-15968
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 3.0.0
>
>         Attachments: HBASE-15968.v05.patch, HBASE-15968-v1.patch, 
> HBASE-15968-v2.patch, HBASE-15968-v3.patch, HBASE-15968-v4.patch
>
>
> In HBase book, we have a section in Versions called "Current Limitations" see 
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See 
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears 
> after then next major compaction has run. Suppose you do a delete of 
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, 
> even if it happened after the delete, will be masked by the delete tombstone. 
> Performing the put will not fail, but when you do a get you will notice the 
> put did have no effect. It will start working again after the major 
> compaction has run. These issues should not be a problem if you use 
> always-increasing versions for new puts to a row. But they can occur even if 
> you do not care about time: just do delete and put immediately after each 
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …​create three cell versions at t1, t2 and t3, with a maximum-versions 
> setting of 2. So when getting all versions, only the values at t2 and t3 will 
> be returned. But if you delete the version at t2 or t3, the one at t1 will 
> appear again. Obviously, once a major compaction has run, such behavior will 
> not be the case anymore…​ (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions: 
> we only consider timestamp, no matter when it comes; we will not remove old 
> version immediately if there are enough number of new versions. 
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in 
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is 
> not greater than 3, and delete t3 means use Delete.addColumn to delete the 
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because 
> the put is after delete.
> case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will 
> always get t2 no matter if there is a major compaction, because t1 is masked 
> when we put t3 so t1 will never be seen.
> case 3: maxversion=2, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, 
> and we will get nothing.
> case 4: maxversion=3, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, 
> and we will get t1 because it is not masked.
> case 5: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3 -> put t1, and 
> we can get t3+t1 because when we put t1 at second time it is the 2nd latest 
> version and it can be read.
> case 6:maxversion=2, put t3->put t2->put t1, and we will get t3+t2 just like 
> what we can get now, ts is still the key of versions.
> Different VERSIONS may result in different results even the size of result is 
> smaller than VERSIONS(see case 3 and 4).  So Get/Scan.setMaxVersions will be 
> handled at end after we read correct data according to CF's  VERSIONS setting.
> The semantics is different from the current HBase, and we may need more logic 
> to support the new semantic, so it is configurable and default is disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to