[jira] [Commented] (PHOENIX-3209) Ensure scans run at specific server timestamp for UPSERT SELECT to same table

James Taylor (JIRA) Wed, 17 May 2017 15:46:23 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014887#comment-16014887
 ]


James Taylor commented on PHOENIX-3209:
---------------------------------------

When a client uses the UPDATE_CACHE_FREQUENCY feature, they're basically saying 
that they *don't* want to ping the server for a timestamp at which to run the 
query every time. So in this case, we don't restrict the upper time range of 
the scan. The rule in SQL is that a statement should not see the changes that 
it's making. The only time this is an issue is when the same table is being 
read and written to. HBase already prevents this through it's MVCC model, but 
this would break down if a split occurs as we'll end up issuing a new scan 
under a different MVCC lock. This is definitely a corner case. It'd require a 
split to occur and for some of the data that was previously written to have 
been written to the new daughter region that hadn't been read yet.

The point you're making is a different issue - should SQL see future 
timestamped data? Currently, it'd be kind of weird as if you're using 
UPDATE_CACHE_FREQUENCY, you'd see the future timestamped data unless you get a 
cache miss (after expiration), in which case the query would be run with an 
upper time bound. I'm not sure what the best answer is for this. Maybe we 
should always hit the server for an UPSERT SELECT or a DELETE that issues a 
scan? How about filing a new JIRA for this one so we can brainstorm?

> Ensure scans run at specific server timestamp for UPSERT SELECT to same table
> -----------------------------------------------------------------------------
>
>                 Key: PHOENIX-3209
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3209
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maddineni Sukumar
>             Fix For: 4.11.0
>
>
> This is a corner case of specifying an UPDATE_CACHE_FREQUENCY on a table and 
> executing an UPSERT SELECT. Without an UPDATE_CACHE_FREQUENCY, we ping the 
> server to ensure we have the latest version of the schema. We'll then run the 
> query based on the server timestamp returned as a result of checking that the 
> schema is up-to-date. If an UPDATE_CACHE_FREQUENCY is set, we skip this RPC 
> which is a potential problem in this case. This becomes more likely when we 
> introduce a default UPATE_CACHE_FREQUENCY with PHOENIX-2885. The fix is to 
> ignore the UPDATE_CACHE_FREQUENCY when an UPSERT SELECT is performed where 
> the source and target table are the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3209) Ensure scans run at specific server timestamp for UPSERT SELECT to same table

Reply via email to