[ 
https://issues.apache.org/jira/browse/PHOENIX-6318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kadir OZDEMIR reassigned PHOENIX-6318:
--------------------------------------

    Assignee: Abhishek Singh Chouhan

> Phoenix client to set maxTimestamp on scans
> -------------------------------------------
>
>                 Key: PHOENIX-6318
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6318
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir OZDEMIR
>            Assignee: Abhishek Singh Chouhan
>            Priority: Major
>
> On regular (non SCN) connections, Phoenix clients do not set the time range 
> for scans. This means that a region server will include all the mutations 
> that have been applied to its table region at the time the scan is opened on 
> the region server. This creates some consistency issues if (1) a single 
> Phoenix query needs to be executed on multiple table regions, (2) a region 
> scanner implemented by Phoenix, e.g., indexing or paging region scanners, 
> closes or reopens the underlying HBase scanner, or (3) HBase itself needs to 
> close and reopen the scanner due its internal activities, e.g., region 
> movement, split or merge.
> The consistency issue for the data tables is that the rows returned by the 
> query would not accurately represent a point in time image of a table. The 
> consistency issue for index tables can be even more severe as the results may 
> include more than an index row (with different row key) for the same data 
> table row. In other words, the result set of a query on an index table may 
> include stale index rows.
> A simple approach to address this issue is to let the Phoenix client set the 
> max timestamp for scans and set the same timestamp for all scans generated 
> for the same Phoenix query (instance). If the clock skew between clients and 
> servers is not large, this approach will greatly improve the consistency for 
> Phoenix queries.
> The side effect of this approach is that if (1) the clock skew between 
> clients and servers is more than the time between the start of processing a 
> mutation on a server and the start of a scan to read the same mutation on a 
> client, and (2) the client wall clock is behind. We assume that this side 
> effect will rarely happen and the benefit of improving the consistency of 
> Phoenix queries will outweigh.
> In future, we can consider better approaches to set the scan max timestamp 
> more accurately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to