[
https://issues.apache.org/jira/browse/HBASE-8721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686862#comment-13686862
]
Feng Honghua commented on HBASE-8721:
-------------------------------------
Another benefit from behaviour "delete can't mask puts happened after it"(in
essence mvcc also participates in delete handling): the 'delete latest
version'(deleteColumn() without timestamp) can have better performance by
removing the read operation in RS which is to get the timestamp of the latest
version and set to the delete.
Below is the update process for 'delete latest version' (under 'delete can't
mask puts happened after it' behaviour):
1. deleteColumn() (without timestamp) issued by client, its timestamp is set
to an 'invalid' value (0/-1 is a good candidate) to indicate 'delete the latest
version'. RS just puts this Delete type kv as other type deletes without read
operation.
2. when Get/Scan, by timestamp=0/-1 we know this delete is to delete the
latest version and check the kv it sees. And we know the first kv with mvcc <
'mvcc of this delete' is the 'latest' version when the delete enters RS. After
delete(mask) this first kv (with mvcc checked) this 'delete latest version'
delete also need to be removed from the ScanDeleteTracker.
That's all.
Then why we can't achieve such light-weight(without read) 'delete latest
version' delete? The root cause is the 'delete can mask puts that happen after
it' behaviour, which doesn't use mvcc in delete handling.
When issuing 'delete latest version'(deleteColumn() without timestamp), the
real semantic is 'to delete the latest one of all the currently EXISTING
versions', the EXISTING means the one happened BEFORE the delete enters RS, and
BEFORE is a concept of operation happening order (indicated by mvcc), which
can't be represented by timestamp.
Then why we can't handle 'delete latest version' without a read, as above
process? Because newer version can be put which has the bigger timestamp (later
than the 'current' latest when delete enters RS, by timestamp), and by
behaviour 'delete can mask puts happened after delete'(its essence is to
determine whether a kv masked by a delete only by comparing their timestamps) a
'delete latest version' delete can't tell whether the first version it sees is
the latest version when itself hit RS (in fact it can use mvcc to get this
information, but it doesn't)
Certainly we can use mvcc only for 'delete latest version' to get the
(remarkable) performance gain by removing the read operation, but it sounds
inconsistent in that we handle deletes internally in different ways (one use
mvcc, other don't)
> Deletes can mask puts that happen after the delete
> --------------------------------------------------
>
> Key: HBASE-8721
> URL: https://issues.apache.org/jira/browse/HBASE-8721
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Feng Honghua
> Attachments: HBASE-8721-0.94-V0.patch
>
>
> this fix aims for bug mentioned in http://hbase.apache.org/book.html 5.8.2.1:
> "Deletes mask puts, even puts that happened after the delete was entered.
> Remember that a delete writes a tombstone, which only disappears after then
> next major compaction has run. Suppose you do a delete of everything <= T.
> After this you do a new put with a timestamp <= T. This put, even if it
> happened after the delete, will be masked by the delete tombstone. Performing
> the put will not fail, but when you do a get you will notice the put did have
> no effect. It will start working again after the major compaction has run.
> These issues should not be a problem if you use always-increasing versions
> for new puts to a row. But they can occur even if you do not care about time:
> just do delete and put immediately after each other, and there is some chance
> they happen within the same millisecond."
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira