Hi All,

Filter is seen on top of Project in query plan for a null equality join.
This is over CSV data, shouldn't the filter appear below the project in the
query plan ?
I am on Drill 1.9.0 git commit id: a29f1e29

Note : t1 has some nulls in it
           t2 does not have any nulls in it.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for
select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2
ON t1.columns[0] = t2.columns[0]
WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0]
OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL );
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      ProjectAllowDup(*=[$0], *0=[$1])
00-02        Project(T43¦¦*=[$0], T44¦¦*=[$2])
00-03          SelectionVectorRemover
00-04            Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), IS
NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =(ITEM($1,
0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS
NULL(ITEM($3, 0))))])
00-05              Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3],
columns0=[$4])
00-06                HashJoin(condition=[=($2, $5)], joinType=[inner])
00-07                  Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2])
00-09                    Project(T44¦¦*=[$0], columns=[$1], $f2=[ITEM($1,
0)])
00-11                      Project(T44¦¦*=[$0], columns=[$1])
00-13                        Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, columns=[`*`],
files=[maprfs:///tmp/oneColWOnulls.csv]]])
00-08                  Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1, 0)])
00-10                    Project(T43¦¦*=[$0], columns=[$1])
00-12                      Scan(groupscan=[EasyGroupScan
[selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1, columns=[`*`],
files=[maprfs:///tmp/oneColDupsWnulls.csv]]])
{noformat}

Results returned by query

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 JOIN
`oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE t1.columns[0]
IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND
t2.columns[0] IS NULL );
+-------------+-------------+
|   columns   |  columns0   |
+-------------+-------------+
| ["test"]    | ["test"]    |
| ["foo"]     | ["foo"]     |
| ["foo"]     | ["foo"]     |
| ["bar"]     | ["bar"]     |
| ["yes"]     | ["yes"]     |
| ["yes"]     | ["yes"]     |
| ["no"]      | ["no"]      |
| ["no"]      | ["no"]      |
| ["foobar"]  | ["foobar"]  |
| ["foobar"]  | ["foobar"]  |
| ["never"]   | ["never"]   |
| ["never"]   | ["never"]   |
| ["ever"]    | ["ever"]    |
| ["ever"]    | ["ever"]    |
| ["here"]    | ["here"]    |
| ["there"]   | ["there"]   |
| ["no"]      | ["no"]      |
| ["no"]      | ["no"]      |
| ["yes"]     | ["yes"]     |
| ["yes"]     | ["yes"]     |
| ["foobar"]  | ["foobar"]  |
| ["foobar"]  | ["foobar"]  |
| ["temp"]    | ["temp"]    |
+-------------+-------------+
23 rows selected (0.341 seconds)
{noformat}

Thanks,
Khurram

Reply via email to