Phil Yang commented on HBASE-15968:
> MVCC-sensitive semantics of versions
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
> Issue Type: New Feature
> Reporter: Phil Yang
> Assignee: Phil Yang
> Attachments: HBASE-15968-v1.patch, HBASE-15968-v2.patch
> In HBase book, we have a section in Versions called "Current Limitations" see
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears
> after then next major compaction has run. Suppose you do a delete of
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put,
> even if it happened after the delete, will be masked by the delete tombstone.
> Performing the put will not fail, but when you do a get you will notice the
> put did have no effect. It will start working again after the major
> compaction has run. These issues should not be a problem if you use
> always-increasing versions for new puts to a row. But they can occur even if
> you do not care about time: just do delete and put immediately after each
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …create three cell versions at t1, t2 and t3, with a maximum-versions
> setting of 2. So when getting all versions, only the values at t2 and t3 will
> be returned. But if you delete the version at t2 or t3, the one at t1 will
> appear again. Obviously, once a major compaction has run, such behavior will
> not be the case anymore… (See Garbage Collection in Bending time in HBase.)
> These limitations result from the current implementation on multi-versions:
> we only consider timestamp, no matter when it comes; we will not remove old
> version immediately if there are enough number of new versions.
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is
> not greater than 3, and delete t3 means use Delete.addColumn to delete the
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because
> the put is after delete.
> case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will
> always get t2 no matter if there is a major compaction, because t1 is masked
> when we put t3 so t1 will never be seen.
> case 3: maxversion=2, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3,
> and we will get nothing.
> case 4: maxversion=3, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3,
> and we will get t1 because it is not masked.
> case 5: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3 -> put t1, and
> we can get t3+t1 because when we put t1 at second time it is the 2nd latest
> version and it can be read.
> case 6:maxversion=2, put t3->put t2->put t1, and we will get t3+t2 just like
> what we can get now, ts is still the key of versions.
> Different VERSIONS may result in different results even the size of result is
> smaller than VERSIONS(see case 3 and 4). So Get/Scan.setMaxVersions will be
> handled at end after we read correct data according to CF's VERSIONS setting.
> The semantics is different from the current HBase, and we may need more logic
> to support the new semantic, so it is configurable and default is disabled.
This message was sent by Atlassian JIRA