[
https://issues.apache.org/jira/browse/HBASE-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Phil Yang updated HBASE-15968:
------------------------------
Description:
In HBase book, we have a section in Versions called "Current Limitations" see
http://hbase.apache.org/book.html#_current_limitations
{quote}
28.3. Current Limitations
28.3.1. Deletes mask Puts
Deletes mask puts, even puts that happened after the delete was entered. See
HBASE-2256. Remember that a delete writes a tombstone, which only disappears
after then next major compaction has run. Suppose you do a delete of everything
⇐ T. After this you do a new put with a timestamp ⇐ T. This put, even if it
happened after the delete, will be masked by the delete tombstone. Performing
the put will not fail, but when you do a get you will notice the put did have
no effect. It will start working again after the major compaction has run.
These issues should not be a problem if you use always-increasing versions for
new puts to a row. But they can occur even if you do not care about time: just
do delete and put immediately after each other, and there is some chance they
happen within the same millisecond.
28.3.2. Major compactions change query results
…create three cell versions at t1, t2 and t3, with a maximum-versions setting
of 2. So when getting all versions, only the values at t2 and t3 will be
returned. But if you delete the version at t2 or t3, the one at t1 will appear
again. Obviously, once a major compaction has run, such behavior will not be
the case anymore… (See Garbage Collection in Bending time in HBase.)
{quote}
These limitations result from the current implementation on multi-versions: we
only consider timestamp, no matter when it comes; we will not remove old
version immediately if there are enough number of new versions.
So we can get a stronger semantics of versions by two guarantees:
1, Delete will not mask Put that comes after it.
2, If a version is masked by enough number of higher versions (VERSIONS in cf's
conf), it will never be seen any more.
Some examples for understanding:
(delete t<=3 means use Delete.addColumns to delete all versions whose ts is not
greater than 3, and delete t3 means use Delete.addColumn to delete the version
whose ts=3)
case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because
the put is after delete.
case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will
always get t2 no matter if there is a major compaction, because t1 is masked
when we put t3 so t1 will never be seen.
case 3: maxversion=2, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, and
we will get nothing.
case 4: maxversion=3, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, and
we will get t1 because it is not masked.
case 5: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3 -> put t1, and we
can get t3+t1 because when we put t1 at second time it is the 2nd latest
version and it can be read.
case 6:maxversion=2, put t3->put t2->put t1, and we will get t3+t2 just like
what we can get now, ts is still the key of versions.
Different VERSIONS may result in different results even the size of result is
smaller than VERSIONS(see case 3 and 4). So Get/Scan.setMaxVersions will be
handled at end after we read correct data according to CF's VERSIONS setting.
The semantics is different from the current HBase, and we may need more logic
to support the new semantic, so it is configurable and default is disabled.
was:
In HBase book, we have a section in Versions called "Current Limitations" see
http://hbase.apache.org/book.html#_current_limitations
{quote}
28.3. Current Limitations
28.3.1. Deletes mask Puts
Deletes mask puts, even puts that happened after the delete was entered. See
HBASE-2256. Remember that a delete writes a tombstone, which only disappears
after then next major compaction has run. Suppose you do a delete of everything
⇐ T. After this you do a new put with a timestamp ⇐ T. This put, even if it
happened after the delete, will be masked by the delete tombstone. Performing
the put will not fail, but when you do a get you will notice the put did have
no effect. It will start working again after the major compaction has run.
These issues should not be a problem if you use always-increasing versions for
new puts to a row. But they can occur even if you do not care about time: just
do delete and put immediately after each other, and there is some chance they
happen within the same millisecond.
28.3.2. Major compactions change query results
…create three cell versions at t1, t2 and t3, with a maximum-versions setting
of 2. So when getting all versions, only the values at t2 and t3 will be
returned. But if you delete the version at t2 or t3, the one at t1 will appear
again. Obviously, once a major compaction has run, such behavior will not be
the case anymore… (See Garbage Collection in Bending time in HBase.)
{quote}
These limitations result from the current implementation on multi-versions: we
only consider timestamp, no matter when it comes; we will not remove old
version immediately if there are enough number of new versions.
So we can get a stronger semantics of versions by two guarantees:
1, Delete will not mask Put that comes after it.
2, If a version is masked by enough number of higher versions (MAXVERSIONS), it
will never be seen any more.
Some examples for understanding:
(delete t<=3 means use Delete.addColumns to delete all versions whose ts is not
greater than 3, and delete t3 means use Delete.addColumn to delete the version
whose ts=3)
case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because
the put is after delete.
case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will
always get t2 no matter if there is a major compaction, because t1 is masked
when we put t3 so t1 will never be seen.
case 3: maxversion=2, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, and
we will get nothing.
case 4: maxversion=3, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3, and
we will get t1 because it is not masked.
case 5: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3 -> put t1, and we
can get t3+t1 because when we put t1 at second time it is the 2nd latest
version and it can be read.
case 6:maxversion=2, put t3->put t2->put t1, and we will get t3+t2 just like
what we can get now, ts is still the key of versions.
Different MAXVERSIONS may result in different results even the size of result
is smaller than MAXVERSIONS(see case 3 and 4). So Get/Scan.setMaxVersions will
be handled at end after we read correct data according to CF's MAXVERSIONS
setting.
The semantics is different from the current HBase, and we may need more logic
to support the new semantic, so it is configurable and default is disabled.
> Strong semantics of versions
> ----------------------------
>
> Key: HBASE-15968
> URL: https://issues.apache.org/jira/browse/HBASE-15968
> Project: HBase
> Issue Type: New Feature
> Reporter: Phil Yang
> Assignee: Phil Yang
>
> In HBase book, we have a section in Versions called "Current Limitations" see
> http://hbase.apache.org/book.html#_current_limitations
> {quote}
> 28.3. Current Limitations
> 28.3.1. Deletes mask Puts
> Deletes mask puts, even puts that happened after the delete was entered. See
> HBASE-2256. Remember that a delete writes a tombstone, which only disappears
> after then next major compaction has run. Suppose you do a delete of
> everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put,
> even if it happened after the delete, will be masked by the delete tombstone.
> Performing the put will not fail, but when you do a get you will notice the
> put did have no effect. It will start working again after the major
> compaction has run. These issues should not be a problem if you use
> always-increasing versions for new puts to a row. But they can occur even if
> you do not care about time: just do delete and put immediately after each
> other, and there is some chance they happen within the same millisecond.
> 28.3.2. Major compactions change query results
> …create three cell versions at t1, t2 and t3, with a maximum-versions
> setting of 2. So when getting all versions, only the values at t2 and t3 will
> be returned. But if you delete the version at t2 or t3, the one at t1 will
> appear again. Obviously, once a major compaction has run, such behavior will
> not be the case anymore… (See Garbage Collection in Bending time in HBase.)
> {quote}
> These limitations result from the current implementation on multi-versions:
> we only consider timestamp, no matter when it comes; we will not remove old
> version immediately if there are enough number of new versions.
> So we can get a stronger semantics of versions by two guarantees:
> 1, Delete will not mask Put that comes after it.
> 2, If a version is masked by enough number of higher versions (VERSIONS in
> cf's conf), it will never be seen any more.
> Some examples for understanding:
> (delete t<=3 means use Delete.addColumns to delete all versions whose ts is
> not greater than 3, and delete t3 means use Delete.addColumn to delete the
> version whose ts=3)
> case 1: put t2 -> put t3 -> delete t<=3 -> put t1, and we will get t1 because
> the put is after delete.
> case 2: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3, and we will
> always get t2 no matter if there is a major compaction, because t1 is masked
> when we put t3 so t1 will never be seen.
> case 3: maxversion=2, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3,
> and we will get nothing.
> case 4: maxversion=3, put t1 -> put t2 -> put t3 -> delete t2 -> delete t3,
> and we will get t1 because it is not masked.
> case 5: maxversion=2, put t1 -> put t2 -> put t3 -> delete t3 -> put t1, and
> we can get t3+t1 because when we put t1 at second time it is the 2nd latest
> version and it can be read.
> case 6:maxversion=2, put t3->put t2->put t1, and we will get t3+t2 just like
> what we can get now, ts is still the key of versions.
> Different VERSIONS may result in different results even the size of result is
> smaller than VERSIONS(see case 3 and 4). So Get/Scan.setMaxVersions will be
> handled at end after we read correct data according to CF's VERSIONS setting.
> The semantics is different from the current HBase, and we may need more logic
> to support the new semantic, so it is configurable and default is disabled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)