[
https://issues.apache.org/jira/browse/PHOENIX-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014887#comment-16014887
]
James Taylor commented on PHOENIX-3209:
---------------------------------------
When a client uses the UPDATE_CACHE_FREQUENCY feature, they're basically saying
that they *don't* want to ping the server for a timestamp at which to run the
query every time. So in this case, we don't restrict the upper time range of
the scan. The rule in SQL is that a statement should not see the changes that
it's making. The only time this is an issue is when the same table is being
read and written to. HBase already prevents this through it's MVCC model, but
this would break down if a split occurs as we'll end up issuing a new scan
under a different MVCC lock. This is definitely a corner case. It'd require a
split to occur and for some of the data that was previously written to have
been written to the new daughter region that hadn't been read yet.
The point you're making is a different issue - should SQL see future
timestamped data? Currently, it'd be kind of weird as if you're using
UPDATE_CACHE_FREQUENCY, you'd see the future timestamped data unless you get a
cache miss (after expiration), in which case the query would be run with an
upper time bound. I'm not sure what the best answer is for this. Maybe we
should always hit the server for an UPSERT SELECT or a DELETE that issues a
scan? How about filing a new JIRA for this one so we can brainstorm?
> Ensure scans run at specific server timestamp for UPSERT SELECT to same table
> -----------------------------------------------------------------------------
>
> Key: PHOENIX-3209
> URL: https://issues.apache.org/jira/browse/PHOENIX-3209
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Maddineni Sukumar
> Fix For: 4.11.0
>
>
> This is a corner case of specifying an UPDATE_CACHE_FREQUENCY on a table and
> executing an UPSERT SELECT. Without an UPDATE_CACHE_FREQUENCY, we ping the
> server to ensure we have the latest version of the schema. We'll then run the
> query based on the server timestamp returned as a result of checking that the
> schema is up-to-date. If an UPDATE_CACHE_FREQUENCY is set, we skip this RPC
> which is a potential problem in this case. This becomes more likely when we
> introduce a default UPATE_CACHE_FREQUENCY with PHOENIX-2885. The fix is to
> ignore the UPDATE_CACHE_FREQUENCY when an UPSERT SELECT is performed where
> the source and target table are the same.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)