[
https://issues.apache.org/jira/browse/HBASE-9797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-9797.
----------------------------------------
Resolution: Won't Fix
> Multi row transactions are not atomic for scanners
> --------------------------------------------------
>
> Key: HBASE-9797
> URL: https://issues.apache.org/jira/browse/HBASE-9797
> Project: HBase
> Issue Type: Bug
> Reporter: Enis Soztutar
> Priority: Major
>
> Multi row atomic puts, as implemented by the coprocessor API is atomic for
> gets and multi gets, but not so much for scanners.
> mvcc read point, as of today, is only kept in RS memory. When a client starts
> the scan, we create a new scanner object and save the mvcc read point of the
> scan there. Since the scan API is row-based, the scan results are only made
> visible to clients row-per-row, and the client scanner keep track of the last
> row seen.
> So, for a multi-row atomic update, the scanner might get an mvcc number which
> is less than the commit point of the multi-row update, so it will skip some
> rows in the scan (will not see the rows). However, in case of RS failover, a
> new scanner will be created which will have a mvcc read number larger than
> the multi-row update commit number. So the scanner will see the remaining
> rows from the transaction.
> Example:
> {code}
> multi put : { {row1, c1, v1}, {row100, c1, v100} } mvcc write number = 2
> scan : scan from row1 to row100 mvcc read number = 1
> {code}
> scanner will not see row1. If RS fails before scanner reaches row100, the new
> scanner will get mvcc read number > 2, so it will see row100.
> There might be a couple of ways to fix this. First approach (as suggested by
> Sergey) is that we can wrap the Scanner into an atomic scanner
> implementation, which will restart the scan in case of a socket timeout or
> server failure, etc. This will batch up the results so that the rows are not
> visible. For small scans (like meta) this might be viable.
> The second way to properly fix this is, first finish up the patch at
> HBASE-8763, then change the scanner to obtain an mvcc number from the RS in
> scanner open, and save the mvcc number in the client side. Upon failure, the
> scanner will continue the scan where it is left. We have to keep the low
> watermark (the smallest mvcc read number of the scanners currently open)
> differently. Currently that number is already tracked, but not across RS
> failover. We can do timeouts to manage the low watermark I think.
> This approach also enables us to implement cell-based streaming scan instead
> of row-based approach we have today.
> Opened the issue, so that it is tracked. Feel free to pick it up if you like.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)