[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

Jianwei Cui (JIRA) Fri, 26 Feb 2016 02:14:32 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168771#comment-15168771
 ]


Jianwei Cui commented on HBASE-15340:
-------------------------------------

[~anoop.hbase], thanks for your comment, I get your point:). Yes, the case you 
mentioned will happen. The page https://hbase.apache.org/acid-semantics.html 
explains the consistency guarantee for scan:
{code}
A scan is not a consistent view of a table. Scans do not exhibit snapshot 
isolation.

Rather, scans have the following properties:

1. Any row returned by the scan will be a consistent view (i.e. that version of 
the complete row existed at some point in time) [1]
2. A scan will always reflect a view of the data at least as new as the 
beginning of the scan. This satisfies the visibility guarantees enumerated 
below.
    1. For example, if client A writes data X and then communicates via a side 
channel to client B, any scans started by client B will contain data at least 
as new as X.
    2. A scan _must_ reflect all mutations committed prior to the construction 
of the scanner, and _may_ reflect some mutations committed subsequent to the 
construction of the scanner.
    3. Scans must include all data written prior to the scan (except in the 
case where data is subsequently mutated, in which case it _may_ reflect the 
mutation)
{code}
It seems the consistent for scan only guarantee to read out data at least as 
new as the beginning of the scan, but no guarantee to whether read out data 
concurrently written or written after the beginning of the scan. 

At the end of the page:
{code}
[1] A consistent view is not guaranteed intra-row scanning -- i.e. fetching a 
portion of a row in one RPC then going back to fetch another portion of the row 
in a subsequent RPC. Intra-row scanning happens when you set a limit on how 
many values to return per Scan#next (See Scan#setBatch(int)).
{code}
It mentioned the problem of this jira that row-level consistent view is not 
guaranteed for intra-row scanning, so this is a known problem?

> Partial row result of scan may return data violates the row-level transaction 
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-15340
>                 URL: https://issues.apache.org/jira/browse/HBASE-15340
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners, Transactions/MVCC
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>
> There are cases the region sever will return partial row result, such as the 
> client set batch for scan or configured size limit reached. In these 
> situations, the client may return data that violates the row-level 
> transaction to the application. The following steps show the problem:
> {code}
> // assume there is a test table 'test_table' with one family 'F' and one 
> region 'region'. 
> // meanwhile there are two region servers 'rsA' and 'rsB'.
> 1. Let 'region' firstly located in 'rsA' and put one row with two columns 
> 'c1' and 'c2' as:
>     > put 'test_table', 'row', 'F:c1', 'value1', 'F:c2', 'value1'
> 2. Start a client to scan 'test_table', with scan.setBatch(1) and 
> scan.setCaching(1). The client will get one column as : {column='F:c1' and 
> value='value1'} in the first rpc call after scanner created, and the result 
> will be returned to application.
> 3. Before the client issues the next request, the 'region' was moved to 'rsB' 
> and accepted another mutations for the two columns 'c1' and 'c2' as:
>     > put 'test_table', 'row', 'F:c1', 'value2', 'F:c2', 'value2'
> 4. Then, the client  will receive a RegionMovedException when issuing next 
> request and will retry to open scanner on 'rsB'. The newly opened scanner 
> will higher mvcc than old data so that could read out column as : { 
> column='F:c2' with value='value2'} and return the result to application.
>    Therefore, the application will get data as:
> 'row'    column='F:c1'   value='value1'
> 'row'    column='F:c2',  value='value2'
>    The returned data is combined from two different mutations and violates 
> the row-level transaction.
> {code}
> The reason is that the newly opened scanner after region moved will get a 
> different mvcc. I am not sure whether this result is by design for scan if 
> partial row result is allowed. However, such row result combined from 
> different transactions may make the application have unexpected state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-15340) Partial row result of scan may return data violates the row-level transaction

Reply via email to