[ 
https://issues.apache.org/jira/browse/HBASE-21596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743331#comment-16743331
 ] 

Wellington Chevreuil commented on HBASE-21596:
----------------------------------------------

Hey [~stack], thanks for sharing your thoughts here!

bq. We do not have this problem reading from hfiles because when we flush, and 
CF Schema is VERSIONS => 1, we only write out one version dropping any others 
that may have been in memstore?
Yeah, and this is filtered by the memstore scanner. The problem is that the 
memstore scan filter logic for versions only counts the amount of cells it has 
read so far, then once the VERSIONS limit has reached, it just skips the 
remaining cells. So once we put a delete marker on the latest cell version, 
that cell will not be accounted, then oldest versions that should had 
disappeared will now pop up on the scan results. 

bq. The versions => 1 case is a specialization on what is described here? 
http://hbase.apache.org/book.html#major.compactions.change.query.results
I would say it's a variation of that, but one that occurs at the memstore, so 
it does not get cleared by major compaction. Problem here is that once the 
delete happened on the memstore and all extra versions are still there, a flush 
will still write the incorrect state to hfile, because it will use the memstore 
scanner to decide what to write.

bq. Reading the class comment on Delete class, is the shell not doing it right? 
Reading the class comment, a delete#addColumn w/ no ts is supposed to delete 
all extant versions?
It should actually delete only latest version:
{noformat}
public Delete addColumn(byte[] family,
                        byte[] qualifier)
Delete the latest version of the specified column. This is an expensive call in 
that on the server-side, it first does a get to find the latest versions 
timestamp. Then it adds a delete using the fetched cells timestamp.
{noformat}
The shell was not compliant with above description, and it was actually 
deleting all versions. This was fixed by HBASE-18142, and shell delete command 
now follows same behaviour as described in Delete class java docs.


> HBase Shell "delete" command can cause older versions to be shown even if 
> VERSIONS is configured as 1
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21596
>                 URL: https://issues.apache.org/jira/browse/HBASE-21596
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Wellington Chevreuil
>            Assignee: Wellington Chevreuil
>            Priority: Minor
>         Attachments: HBASE-21596-master.001.patch, 
> HBASE-21596-master.002.patch, HBASE-21596-master.003.patch, initial-patch.txt
>
>
> HBase Shell delete command is supposed to operate over an specific TS. If no 
> TS is informed, it will assume the latest TS for the cell and put delete 
> marker for it. 
> However, for a CF with VERSIONS => 1, if multiple puts were performed for 
> same cell, there may be multiple cell versions on the memstore, so delete 
> would only be "delete marking" one of those, and causing the most recent no 
> marked one to be shown on gets/scans, which then contradicts the CF "VERSIONS 
> => 1" configuration.
> This issue is not seen with deleteall command or using Delete operation from 
> Java API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to