[
https://issues.apache.org/jira/browse/HBASE-21596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743331#comment-16743331
]
Wellington Chevreuil commented on HBASE-21596:
----------------------------------------------
Hey [~stack], thanks for sharing your thoughts here!
bq. We do not have this problem reading from hfiles because when we flush, and
CF Schema is VERSIONS => 1, we only write out one version dropping any others
that may have been in memstore?
Yeah, and this is filtered by the memstore scanner. The problem is that the
memstore scan filter logic for versions only counts the amount of cells it has
read so far, then once the VERSIONS limit has reached, it just skips the
remaining cells. So once we put a delete marker on the latest cell version,
that cell will not be accounted, then oldest versions that should had
disappeared will now pop up on the scan results.
bq. The versions => 1 case is a specialization on what is described here?
http://hbase.apache.org/book.html#major.compactions.change.query.results
I would say it's a variation of that, but one that occurs at the memstore, so
it does not get cleared by major compaction. Problem here is that once the
delete happened on the memstore and all extra versions are still there, a flush
will still write the incorrect state to hfile, because it will use the memstore
scanner to decide what to write.
bq. Reading the class comment on Delete class, is the shell not doing it right?
Reading the class comment, a delete#addColumn w/ no ts is supposed to delete
all extant versions?
It should actually delete only latest version:
{noformat}
public Delete addColumn(byte[] family,
byte[] qualifier)
Delete the latest version of the specified column. This is an expensive call in
that on the server-side, it first does a get to find the latest versions
timestamp. Then it adds a delete using the fetched cells timestamp.
{noformat}
The shell was not compliant with above description, and it was actually
deleting all versions. This was fixed by HBASE-18142, and shell delete command
now follows same behaviour as described in Delete class java docs.
> HBase Shell "delete" command can cause older versions to be shown even if
> VERSIONS is configured as 1
> -----------------------------------------------------------------------------------------------------
>
> Key: HBASE-21596
> URL: https://issues.apache.org/jira/browse/HBASE-21596
> Project: HBase
> Issue Type: Bug
> Reporter: Wellington Chevreuil
> Assignee: Wellington Chevreuil
> Priority: Minor
> Attachments: HBASE-21596-master.001.patch,
> HBASE-21596-master.002.patch, HBASE-21596-master.003.patch, initial-patch.txt
>
>
> HBase Shell delete command is supposed to operate over an specific TS. If no
> TS is informed, it will assume the latest TS for the cell and put delete
> marker for it.
> However, for a CF with VERSIONS => 1, if multiple puts were performed for
> same cell, there may be multiple cell versions on the memstore, so delete
> would only be "delete marking" one of those, and causing the most recent no
> marked one to be shown on gets/scans, which then contradicts the CF "VERSIONS
> => 1" configuration.
> This issue is not seen with deleteall command or using Delete operation from
> Java API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)