[
https://issues.apache.org/jira/browse/DRILL-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941377#comment-16941377
]
Aman Sinha commented on DRILL-7391:
-----------------------------------
I narrowed it down to the Calcite upgrade [1] that was done in Drill 1.13.0.
Just before the Calcite upgrade, the logical plan and the results are correct
(See below). After the upgrade, the logical plan has an extra 'columns' field
in the Project on the right input of the Left Outer Join, which is what is
causing the problem. The 'columns' is present because the ITEM expression
above the LOJ was only pushed down to the left side of the join but not to the
right side.
Here's the sequence which should clarify: (here I am using 2 different tables
compared to what I used in the Description but the result is similar)
{noformat}
// Before calcite upgrade in Drill 1.13.0
// RIGHT result
0: jdbc:drill:zk=local> select * from sys.version;
+------------------+-------------------------------------------+------------------------+----------------------------+----------------------+----------------------------+
| version | commit_id |
commit_message | commit_time | build_email |
build_time |
+------------------+-------------------------------------------+------------------------+----------------------------+----------------------+----------------------------+
| 1.13.0-SNAPSHOT | 450e67094eb6e9a6484d7f86c49b51c77a08d7b2 | REVERTED:
DRILL-5089 | 16.01.2018 @ 02:10:13 PST | [email protected] | 27.09.2019 @
11:25:45 PDT |
+------------------+-------------------------------------------+------------------------+----------------------------+----------------------+----------------------------+
0: jdbc:drill:zk=local> explain plan without implementation for select
tt7.columns[0], tt8.columns[0] as x from tt7 left outer join tt8 on
tt7.columns[0] = tt8.columns[0];
+------+------+
| text | json |
+------+------+
| DrillScreenRel
DrillProjectRel(EXPR$0=[$1], x=[$3])
DrillJoinRel(condition=[=($0, $2)], joinType=[left])
DrillProjectRel($f2=[ITEM($0, 0)], ITEM=[ITEM($0, 0)])
DrillScanRel(table=[[dfs, tmp, tt7]], groupscan=[EasyGroupScan
[selectionRoot=file:/tmp/tt7, numFiles=1, columns=[`columns`[0]],
files=[file:/tmp/tt7/0_0_0.csv]]])
DrillProjectRel($f2=[ITEM($0, 0)], ITEM=[ITEM($0, 0)])
DrillScanRel(table=[[dfs, tmp, tt8]], groupscan=[EasyGroupScan
[selectionRoot=file:/tmp/tt8, numFiles=1, columns=[`columns`[0]],
files=[file:/tmp/tt8/0_0_0.csv]]])
0: jdbc:drill:zk=local> select tt7.columns[0], tt8.columns[0] as x from tt7
left outer join tt8 on tt7.columns[0] = tt8.columns[0];
+---------+-------+
| EXPR$0 | x |
+---------+-------+
| 1414 | 1414 |
| 455 | null |
| 555 | null |
| 1414 | 1414 |
| 455 | null |
| 9669 | 9669 |
| 555 | null |
+---------+-------+
// After Calcite upgrade in Drill 1.13.0
// WRONG result
0: jdbc:drill:zk=local> select * from sys.version;
+------------------+-------------------------------------------+--------------------------------+----------------------------+----------------------+----------------------------+
| version | commit_id |
commit_message | commit_time | build_email |
build_time |
+------------------+-------------------------------------------+--------------------------------+----------------------------+----------------------+----------------------------+
| 1.13.0-SNAPSHOT | 3f0e517fb62a1ebad92fb473e787d343152920d6 | DRILL-3993:
Resolve conflicts | 16.01.2018 @ 02:10:13 PST | [email protected] |
27.09.2019 @ 11:53:51 PDT |
+------------------+-------------------------------------------+--------------------------------+----------------------------+----------------------+----------------------------+
0: jdbc:drill:zk=local> explain plan without implementation for select
tt7.columns[0], tt8.columns[0] as x from tt7 left outer join tt8 on
tt7.columns[0] = tt8.columns[0];
+------+------+
| text | json |
+------+------+
| DrillScreenRel
DrillProjectRel(EXPR$0=[$1], x=[ITEM($2, 0)])
DrillJoinRel(condition=[=($0, $3)], joinType=[left])
DrillProjectRel($f2=[ITEM($0, 0)], ITEM=[ITEM($0, 0)])
DrillScanRel(table=[[dfs, tmp, tt7]], groupscan=[EasyGroupScan
[selectionRoot=file:/tmp/tt7, numFiles=1, columns=[`columns`[0]],
files=[file:/tmp/tt7/0_0_0.csv]]])
DrillProjectRel(columns=[$0], $f2=[ITEM($0, 0)])
DrillScanRel(table=[[dfs, tmp, tt8]], groupscan=[EasyGroupScan
[selectionRoot=file:/tmp/tt8, numFiles=1, columns=[`columns`, `columns`[0]],
files=[file:/tmp/tt8/0_0_0.csv]]])
0: jdbc:drill:zk=local> select tt7.columns[0], tt8.columns[0] as x from tt7
left outer join tt8 on tt7.columns[0] = tt8.columns[0];
+---------+-------+
| EXPR$0 | x |
+---------+-------+
| 1414 | 9669 |
| 455 | null |
| 555 | null |
| 1414 | 9669 |
| 455 | null |
| 9669 | 9669 |
| 555 | null |
+---------+-------+
{noformat}
> Wrong result when doing left outer join on CSV table
> ----------------------------------------------------
>
> Key: DRILL-7391
> URL: https://issues.apache.org/jira/browse/DRILL-7391
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.16.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
> Fix For: 1.17.0
>
> Attachments: tt5.tar.gz, tt6.tar.gz
>
>
> The following query shows 1 row that is incorrect. For the non-null rows,
> both columns should have the same value. This is on CSV sample data (I will
> attach the files).
> {noformat}
> apache drill (dfs.tmp)> select tt5.columns[0], tt6.columns[0] from tt5 left
> outer join tt6 on tt5.columns[0] = tt6.columns[0];
> +--------+--------+
> | EXPR$0 | EXPR$1 |
> +--------+--------+
> | 455 | null |
> | 455 | null |
> | 555 | null |
> | 1414 | 1414 |
> | 455 | null |
> | 580 | null |
> | 1111 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 455 | null |
> | 455 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 455 | null |
> | 580 | null |
> | 6767 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 555 | null |
> | 555 | null |
> | 555 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 455 | null |
> | 6767 | null |
> | 555 | null |
> | 555 | null |
> | 455 | null |
> | 555 | null |
> | 555 | null |
> | 1414 | 1414 |
> | 455 | null |
> | 555 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 555 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 9669 | 1414 | <--- Wrong result
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 455 | null |
> | 555 | null |
> | 580 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 409 | null |
> | 455 | null |
> | 555 | null |
> | 555 | null |
> | 455 | null |
> | 455 | null |
> | 555 | null |
> | 455 | null |
> | 555 | null |
> | 1414 | 1414 |
> | 455 | null |
> | 555 | null |
> | 555 | null |
> | 555 | null |
> +--------+--------+
> 75 rows selected
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)