[ 
https://issues.apache.org/jira/browse/HBASE-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-9797.
----------------------------------------
    Resolution: Won't Fix

> Multi row transactions are not atomic for scanners
> --------------------------------------------------
>
>                 Key: HBASE-9797
>                 URL: https://issues.apache.org/jira/browse/HBASE-9797
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Priority: Major
>
> Multi row atomic puts, as implemented by the coprocessor API is atomic for 
> gets and multi gets, but not so much for scanners. 
> mvcc read point, as of today, is only kept in RS memory. When a client starts 
> the scan, we create a new scanner object and save the mvcc read point of the 
> scan there. Since the scan API is row-based, the scan results are only made 
> visible to clients row-per-row, and the client scanner keep track of the last 
> row seen. 
> So, for a multi-row atomic update, the scanner might get an mvcc number which 
> is less than the commit point of the multi-row update, so it will skip some 
> rows in the scan (will not see the rows). However, in case of RS failover, a 
> new scanner will be created which will have a mvcc read number larger than 
> the multi-row update commit number. So the scanner will see the remaining 
> rows from the transaction. 
> Example: 
> {code}
> multi put : { {row1, c1, v1}, {row100, c1, v100} } mvcc write number = 2
> scan : scan from row1 to row100  mvcc read number = 1
> {code}
> scanner will not see row1. If RS fails before scanner reaches row100, the new 
> scanner will get mvcc read number > 2, so it will see row100. 
> There might be a couple of ways to fix this. First approach (as suggested by 
> Sergey) is that we can wrap the Scanner into an atomic scanner 
> implementation, which will restart the scan in case of a socket timeout or 
> server failure, etc. This will batch up the results so that the rows are not 
> visible. For small scans (like meta) this might be viable. 
> The second way to properly fix this is, first finish up the patch at 
> HBASE-8763, then change the scanner to obtain an mvcc number from the RS in 
> scanner open, and save the mvcc number in the client side. Upon failure, the 
> scanner will continue the scan where it is left. We have to keep the low 
> watermark (the smallest mvcc read number of the scanners currently open) 
> differently. Currently that number is already tracked, but not across RS 
> failover. We can do timeouts to manage the low watermark I think. 
> This approach also enables us to implement cell-based streaming scan instead 
> of row-based approach we have today. 
> Opened the issue, so that it is tracked. Feel free to pick it up if you like. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to