Re: Filter appears above project in query plan - Null Equality Join

Zelaine Fong Wed, 26 Oct 2016 13:51:14 -0700

The filter, I assume you' referring to, is a join filter.  So, at a
minimum, it needs to be applied after the hash join.  I'm not sure there's
a lot of benefit in pushing that filter past the project that's on top of
the hash join.


-- Zelaine

On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz <[email protected]>
wrote:

> Hi All,
>
> Filter is seen on top of Project in query plan for a null equality join.
> This is over CSV data, shouldn't the filter appear below the project in the
> query plan ?
> I am on Drill 1.9.0 git commit id: a29f1e29
>
> Note : t1 has some nulls in it
>            t2 does not have any nulls in it.
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for
> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2
> ON t1.columns[0] = t2.columns[0]
> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0]
> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL );
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      ProjectAllowDup(*=[$0], *0=[$1])
> 00-02        Project(T43¦¦*=[$0], T44¦¦*=[$2])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), IS
> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =(ITEM($1,
> 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS
> NULL(ITEM($3, 0))))])
> 00-05              Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3],
> columns0=[$4])
> 00-06                HashJoin(condition=[=($2, $5)], joinType=[inner])
> 00-07                  Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2])
> 00-09                    Project(T44¦¦*=[$0], columns=[$1], $f2=[ITEM($1,
> 0)])
> 00-11                      Project(T44¦¦*=[$0], columns=[$1])
> 00-13                        Scan(groupscan=[EasyGroupScan
> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, columns=[`*`],
> files=[maprfs:///tmp/oneColWOnulls.csv]]])
> 00-08                  Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1,
> 0)])
> 00-10                    Project(T43¦¦*=[$0], columns=[$1])
> 00-12                      Scan(groupscan=[EasyGroupScan
> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1,
> columns=[`*`],
> files=[maprfs:///tmp/oneColDupsWnulls.csv]]])
> {noformat}
>
> Results returned by query
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 JOIN
> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE t1.columns[0]
> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND
> t2.columns[0] IS NULL );
> +-------------+-------------+
> |   columns   |  columns0   |
> +-------------+-------------+
> | ["test"]    | ["test"]    |
> | ["foo"]     | ["foo"]     |
> | ["foo"]     | ["foo"]     |
> | ["bar"]     | ["bar"]     |
> | ["yes"]     | ["yes"]     |
> | ["yes"]     | ["yes"]     |
> | ["no"]      | ["no"]      |
> | ["no"]      | ["no"]      |
> | ["foobar"]  | ["foobar"]  |
> | ["foobar"]  | ["foobar"]  |
> | ["never"]   | ["never"]   |
> | ["never"]   | ["never"]   |
> | ["ever"]    | ["ever"]    |
> | ["ever"]    | ["ever"]    |
> | ["here"]    | ["here"]    |
> | ["there"]   | ["there"]   |
> | ["no"]      | ["no"]      |
> | ["no"]      | ["no"]      |
> | ["yes"]     | ["yes"]     |
> | ["yes"]     | ["yes"]     |
> | ["foobar"]  | ["foobar"]  |
> | ["foobar"]  | ["foobar"]  |
> | ["temp"]    | ["temp"]    |
> +-------------+-------------+
> 23 rows selected (0.341 seconds)
> {noformat}
>
> Thanks,
> Khurram
>

Re: Filter appears above project in query plan - Null Equality Join

Reply via email to