[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

James Taylor (JIRA) Mon, 10 Oct 2016 17:35:01 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547135#comment-15547135
 ]


James Taylor edited comment on PHOENIX-6 at 10/11/16 12:33 AM:
---------------------------------------------------------------

Would you mind filing a separate JIRA for your use case? If I understand it 
correctly, you'd like to be able to have multiple atomic insert/update 
statements for the same row in the same commit batch and have them sequentially 
execute? Would the atomicity be for the entire commit batch (for all rows being 
operated on) or if not, how would potentially overlapping rows from other 
batches be handled?

For your question, I think you meant to use the same table name for both 
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see 
the data from the UPSERT VALUES. When auto commit is off, the data is kept in 
memory on the client until the commit occurs. The UPSERT VALUES would cause a 
new row to be cached on the client (in memory) and the UPSERT SELECT would 
cause N rows to be cached on the client (where N is the number of rows 
currently in the table). The commit essentially generates the HBase Puts and 
does a batch mutate.

For transactional tables, you do see your own updates. In that case, the UPSERT 
SELECT would see the row from the UPSERT VALUES call.

One more thought I had, [~cameron.hatfield]. If we allowed multiple UPSERT 
VALUES ... ON DUPLICATE KEY statements for the same row key, it might work as 
you'd like. This could potentially be accomplished by having a different 
version for each Cell. In that case, each statement would operate under its own 
row lock on the latest value. Is that more of what you're looking for?


was (Author: jamestaylor):
Would you mind filing a separate JIRA for your use case? If I understand it 
correctly, you'd like to be able to have multiple atomic insert/update 
statements for the same row in the same commit batch and have them sequentially 
execute? Would the atomicity be for the entire commit batch (for all rows being 
operated on) or if not, how would potentially overlapping rows from other 
batches be handled?

For your question, I think you meant to use the same table name for both 
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see 
the data from the UPSERT VALUES. When auto commit is off, the data is kept in 
memory on the client until the commit occurs. The UPSERT VALUES would cause a 
new row to be cached on the client (in memory) and the UPSERT SELECT would 
cause N rows to be cached on the client (where N is the number of rows 
currently in the table). The commit essentially generates the HBase Puts and 
does a batch mutate.

For transactional tables, you do see your own updates. In that case, the UPSERT 
SELECT would see the row from the UPSERT VALUES call.

> Support ON DUPLICATE KEY construct
> ----------------------------------
>
>                 Key: PHOENIX-6
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: James Taylor
>             Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should 
> support the "on duplicate key" construct for UPSERT. With this construct, the 
> UPSERT VALUES statement would run atomically and would thus require a read 
> before write which would obviously have a negative impact on performance. For 
> an example of similar syntax , see MySQL documentation at 
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: 
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J. 
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already 
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE 
> <column>=<expression>, ...] ]
> {code}
> The following restrictions will apply:
> - The <column> may not be part of the primary key constraint - only KeyValue 
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the 
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated 
> on the client-side, while this new syntax would potentially modify the value 
> on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-6) Support ON DUPLICATE KEY construct

Reply via email to