[
https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15547135#comment-15547135
]
James Taylor edited comment on PHOENIX-6 at 10/11/16 12:33 AM:
---------------------------------------------------------------
Would you mind filing a separate JIRA for your use case? If I understand it
correctly, you'd like to be able to have multiple atomic insert/update
statements for the same row in the same commit batch and have them sequentially
execute? Would the atomicity be for the entire commit batch (for all rows being
operated on) or if not, how would potentially overlapping rows from other
batches be handled?
For your question, I think you meant to use the same table name for both
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see
the data from the UPSERT VALUES. When auto commit is off, the data is kept in
memory on the client until the commit occurs. The UPSERT VALUES would cause a
new row to be cached on the client (in memory) and the UPSERT SELECT would
cause N rows to be cached on the client (where N is the number of rows
currently in the table). The commit essentially generates the HBase Puts and
does a batch mutate.
For transactional tables, you do see your own updates. In that case, the UPSERT
SELECT would see the row from the UPSERT VALUES call.
One more thought I had, [~cameron.hatfield]. If we allowed multiple UPSERT
VALUES ... ON DUPLICATE KEY statements for the same row key, it might work as
you'd like. This could potentially be accomplished by having a different
version for each Cell. In that case, each statement would operate under its own
row lock on the latest value. Is that more of what you're looking for?
was (Author: jamestaylor):
Would you mind filing a separate JIRA for your use case? If I understand it
correctly, you'd like to be able to have multiple atomic insert/update
statements for the same row in the same commit batch and have them sequentially
execute? Would the atomicity be for the entire commit batch (for all rows being
operated on) or if not, how would potentially overlapping rows from other
batches be handled?
For your question, I think you meant to use the same table name for both
statements, right? For non transactional tables, the UPSERT SELECT wouldn't see
the data from the UPSERT VALUES. When auto commit is off, the data is kept in
memory on the client until the commit occurs. The UPSERT VALUES would cause a
new row to be cached on the client (in memory) and the UPSERT SELECT would
cause N rows to be cached on the client (where N is the number of rows
currently in the table). The commit essentially generates the HBase Puts and
does a batch mutate.
For transactional tables, you do see your own updates. In that case, the UPSERT
SELECT would see the row from the UPSERT VALUES call.
> Support ON DUPLICATE KEY construct
> ----------------------------------
>
> Key: PHOENIX-6
> URL: https://issues.apache.org/jira/browse/PHOENIX-6
> Project: Phoenix
> Issue Type: New Feature
> Reporter: James Taylor
> Assignee: James Taylor
> Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should
> support the "on duplicate key" construct for UPSERT. With this construct, the
> UPSERT VALUES statement would run atomically and would thus require a read
> before write which would obviously have a negative impact on performance. For
> an example of similar syntax , see MySQL documentation at
> http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail:
> https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J.
> A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0)
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already
> exists and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0)
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE
> <column>=<expression>, ...] ]
> {code}
> The following restrictions will apply:
> - The <column> may not be part of the primary key constraint - only KeyValue
> columns will be allowed.
> To handle the maintenance of immutable indexes, we'll need to push the
> maintenance to the server side.
> This is because the mutations for indexes on immutable tables are calculated
> on the client-side, while this new syntax would potentially modify the value
> on the server-side.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)