[ 
https://issues.apache.org/jira/browse/HBASE-17177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714094#comment-15714094
 ] 

stack commented on HBASE-17177:
-------------------------------

A region opens after a move, and a major compaction could start. It would look 
for smallest read point. There might be none so it would think it could clean 
up all deletes.

After, a restarted scan comes in with an mvcc that is older than current read 
point.

Region does not keep record of the mvcc that the last or current ongoing major 
compaction used. If it did, we could fail the scan if its mvcc was older than 
that of the major compaction.

Yeah, seems smart to delay major compaction until a good while after a region 
opens so restarted acanners have a chance of getting back in. Can we find a 
latch that is other than time based (Wait a few minutes)?

Compactions get promoted from minor to major if it happens that the minor 
compaction includes all hfiles. We'd have to undo this or not allow the upgrade.

Not sure about NONE/ROW/REGION. Can we do REGION first, since mvcc is by 
region, and then if needed do ROW and NONE.

This is an awkward problem. 

> Major compaction can break the region/row level atomic when scan even if we 
> pass mvcc to client
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17177
>                 URL: https://issues.apache.org/jira/browse/HBASE-17177
>             Project: HBase
>          Issue Type: Sub-task
>          Components: scan
>            Reporter: Duo Zhang
>             Fix For: 2.0.0, 1.4.0
>
>
> We know that major compaction will actually delete the cells which are 
> deleted by a delete marker. In order to give a consistent view for a scan, we 
> need to use a map to track the read points for all scanners for a region, and 
> the smallest one will be used for a compaction. For all delete markers whose 
> mvcc is greater than this value, we will not use it to delete other cells.
> And the problem for a scan restart after region move is that, the new RS does 
> not have the information of the scanners opened at the old RS before the 
> client sends scan requests to the new RS which means the read points map is 
> incomplete and the smallest read point maybe greater than the correct value. 
> So if a major compaction happens at that time, it may delete some cells which 
> should be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to