[
https://issues.apache.org/jira/browse/PHOENIX-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269483#comment-17269483
]
ASF GitHub Bot commented on PHOENIX-6232:
-----------------------------------------
dbwong edited a comment on pull request #992:
URL: https://github.com/apache/phoenix/pull/992#issuecomment-764819524
Can we close this PR? I see the code in 4.x and I see
https://issues.apache.org/jira/browse/PHOENIX-6232 is marked as resolved.
@comnetwork
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Correlated subquery should not push to RegionServer as the probe side of the
> Hash join
> --------------------------------------------------------------------------------------
>
> Key: PHOENIX-6232
> URL: https://issues.apache.org/jira/browse/PHOENIX-6232
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.15.0
> Reporter: Mate Szalay-Beko
> Assignee: chenglei
> Priority: Major
> Fix For: 5.1.0, 4.16.0
>
> Attachments: PHOENIX-6232_addendum-4.x.patch,
> PHOENIX-6232_addendum-master.diff, PHOENIX-6232_v1-4.x.patch,
> PHOENIX-6232_v1-master.patch
>
>
> We were facing an interesting problem when a more complex query (with inner
> selects in the WHERE clause) succeeds alone, while the same query fails, if
> it is part of a join. I created a test table / query to reproduce the problem:
> {code:sql}
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (
> id INTEGER NOT NULL,
> test_id INTEGER,
> lastchanged TIMESTAMP,
> CONSTRAINT my_pk PRIMARY KEY (id));
> UPSERT INTO test VALUES(0, 100, '2000-01-01 00:00:00.0');
> UPSERT INTO test VALUES(1, 101, '2000-01-01 00:00:00.0');
> UPSERT INTO test VALUES(2, 100, '2011-11-11 11:11:11.0');
> {code}
> *Query 1:* Example query, running fine in itself:
> {code:sql}
> SELECT id, test_id, lastchanged FROM test T
> WHERE lastchanged = ( SELECT max(lastchanged) FROM test WHERE test_id =
> T.test_id )
> Returns:
> +----+---------+-----------------------+
> | ID | TEST_ID | LASTCHANGED |
> +----+---------+-----------------------+
> | 1 | 101 | 2000-01-01 01:00:00.0 |
> | 2 | 100 | 2011-11-11 12:11:11.0 |
> +----+---------+-----------------------+
> {code}
> *Query 2:* Same query fails on the current master branch, when it is part of
> a larger (implicit) join:
> {code:sql}
> SELECT AAA.*
> FROM
> (
> SELECT id, test_id, lastchanged FROM test T
> WHERE lastchanged = ( SELECT max(lastchanged) FROM test WHERE test_id =
> T.test_id )
> ) as AAA,
> (
> SELECT id FROM test
> ) as BBB
> WHERE AAA.id = BBB.id;
> java.lang.IllegalArgumentException
> at
> org.apache.phoenix.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:128)
> at
> org.apache.phoenix.compile.TupleProjectionCompiler.createProjectedTable(TupleProjectionCompiler.java:66)
> at
> org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:663)
> at
> org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:404)
> at
> org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:302)
> at
> org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:249)
> at
> org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:176)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:504)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:467)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:298)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:297)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:290)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1933)
> at sqlline.Commands.executeSingleQuery(Commands.java:1054)
> at sqlline.Commands.execute(Commands.java:1003)
> at sqlline.Commands.sql(Commands.java:967)
> at sqlline.SqlLine.dispatch(SqlLine.java:734)
> at sqlline.SqlLine.begin(SqlLine.java:541)
> at sqlline.SqlLine.start(SqlLine.java:267)
> at sqlline.SqlLine.main(SqlLine.java:206)
> {code}
> I am not sure what the problem is exactly. My guess is that Phoenix tries to
> optimize (flatten) an inner-query, which it shouldn't, if we are inside a
> join (according to the check in the code which throws the exception).
> The best workaround I found was to define an explicit join in the original
> query (Query 1), basically change the inner select into a join. This modified
> query return the same as the original one:
> *Query 3:*
> {code:sql}
> SELECT T.id, T.test_id, T.lastchanged
> FROM
> test T
> LEFT JOIN (
> SELECT max(lastchanged) AS max_timestamp,
> test_id AS max_timestamp_test_id
> FROM test
> GROUP BY test_id
> ) JOIN_TABLE ON JOIN_TABLE.max_timestamp_test_id = T.test_id
> WHERE T.lastchanged = JOIN_TABLE.max_timestamp
> Returns:
> +------+-----------+-----------------------+
> | T.ID | T.TEST_ID | T.LASTCHANGED |
> +------+-----------+-----------------------+
> | 1 | 101 | 2000-01-01 01:00:00.0 |
> | 2 | 100 | 2011-11-11 12:11:11.0 |
> +------+-----------+-----------------------+
> {code}
> *Query 4:* And the same modified query (query 3) now works inside a join:
> {code:sql}
> SELECT AAA.*
> FROM
> (
> SELECT T.id, T.test_id, T.lastchanged
> FROM
> test T
> LEFT JOIN (
> SELECT max(lastchanged) AS max_timestamp,
> test_id AS max_timestamp_test_id
> FROM test
> GROUP BY test_id
> ) JOIN_TABLE ON JOIN_TABLE.max_timestamp_test_id = T.test_id
> WHERE T.lastchanged = JOIN_TABLE.max_timestamp
> ) as AAA,
> (
> SELECT id FROM test
> ) as BBB
> WHERE AAA.id = BBB.id;
> Returns:
> +------+-----------+-----------------------+
> | T.ID | T.TEST_ID | T.LASTCHANGED |
> +------+-----------+-----------------------+
> | 1 | 101 | 2000-01-01 01:00:00.0 |
> | 2 | 100 | 2011-11-11 12:11:11.0 |
> +------+-----------+-----------------------+
> {code}
> I think Query 4 worked, as it is forcing Phoenix to drop the idea of
> optimizing it's inner-query (Query 3). Although, I can be wrong about the
> root cause...
> Anyway, I think the bug should be fixed and Query 2 should run without
> exception.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)