Kadir OZDEMIR created PHOENIX-6318:
--------------------------------------
Summary: Phoenix client to set maxTimestamp on scans
Key: PHOENIX-6318
URL: https://issues.apache.org/jira/browse/PHOENIX-6318
Project: Phoenix
Issue Type: Improvement
Reporter: Kadir OZDEMIR
On regular (non SCN) connections, Phoenix client does not set the time range
for scans. This means that a region server will include all the mutations that
have been applied to its table region at the time the scan is opened on the
region server. This creates some consistency issues if (1) a single Phoenix
query needs to be executed on multiple table regions, (2) a region scanner
implemented by Phoenix, e.g., indexing or paging region scanners, closes or
reopens the underlying HBase scanner, or (3) HBase itself needs to close and
reopen the scanner due its internal activities, e.g., region movement, split or
merge.
The consistency issue for the data tables is that the rows returned by the
query would not represent accurately a point in time image of a table. The
consistency issue for index tables can be even for more severe as the results
may include more than an index row (with different row key) for the same data
table row. In other words, the result set of a query on an index table may
include stale index rows.
A simple approach to address this issue is to let the Phoenix client set the
max timestamp for scans and set the same timestamp for all scans generated for
the same Phoenix query (instance). If the clock skew between a Phoenix client
and server is not large, this approach will greatly improve the consistency of
the Phoenix queries.
The side effect of this approach is that if (1) the clock skew between client
and server is more than the time between the start of processing a mutation on
the server and the start of a scan to read the same mutation on the client, and
(2) the client wall clock is behind. We assume that this side effect will
rarely happen and the benefit of improving the consistency of Phoenix queries
will outweigh.
In future, we can consider better approaches to set the scan max timestamp more
accurately.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)