[
https://issues.apache.org/jira/browse/PHOENIX-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354794#comment-16354794
]
Maryann Xue commented on PHOENIX-4586:
--------------------------------------
I'll take a look, [~sergey.soldatov]
> UPSERT SELECT doesn't take in account comparison operators for subqueries.
> --------------------------------------------------------------------------
>
> Key: PHOENIX-4586
> URL: https://issues.apache.org/jira/browse/PHOENIX-4586
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.14.0
> Reporter: Sergey Soldatov
> Priority: Critical
> Fix For: 4.14.0
>
>
> If upsert select has a where condition that is using any comparison operator
> (including ANY/SOME/etc), the whole WHERE clause just ignored. Table:
> {noformat}
> create table T (id integer primary key, i1 integer);
> upsert into T values (1,1);
> upsert into T values (2,2);
> {noformat}
> Query that should not upsert anything because we have a condition in where
> that I1 should be greater than any value we already have as well as not
> existing ID:
> {noformat}
> 0: jdbc:phoenix:> upsert into T select id, 4 from T where id = 3 AND i1 >
> (select i1 from T);
> 2 rows affected (0.02 seconds)
> 0: jdbc:phoenix:> select * from T;
> +-----+-----+
> | ID | I1 |
> +-----+-----+
> | 1 | 4 |
> | 2 | 4 |
> +-----+-----+
> 2 rows selected (0.014 seconds)
> {noformat}
> Now with ANY. Should not upsert anything as well because ID is [1,2], while
> I1 are all '4':
> {noformat}
> 0: jdbc:phoenix:> upsert into T select id, 5 from T where id = 2 AND i1 = ANY
> (select ID from T);
> 2 rows affected (0.016 seconds)
> 0: jdbc:phoenix:> select * from T;
> +-----+-----+
> | ID | I1 |
> +-----+-----+
> | 1 | 5 |
> | 2 | 5 |
> +-----+-----+
> 2 rows selected (0.013 seconds)
> {noformat}
> A similar query with IN works just fine:
> {noformat}
> 0: jdbc:phoenix:> upsert into T select id, 6 from T where id = 2 AND i1 IN
> (select ID from T);
> No rows affected (0.094 seconds)
> 0: jdbc:phoenix:> select * from T;
> +-----+-----+
> | ID | I1 |
> +-----+-----+
> | 1 | 5 |
> | 2 | 5 |
> +-----+-----+
> 2 rows selected (0.014 seconds)
> {noformat}
> The reason for this behavior is that for IN we convert subselect to semi-join
> and execute upsert on the client side. For comparisons, we don't perform any
> transformations and query is considered flat and finally executed on the
> server side. Not sure why, but we also completely ignore the second
> condition in WHERE clause as well and that may lead to a serious data loss.
> [~jamestaylor], [~maryannxue] any thoughts or suggestions how to fix that are
> really appreciated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)