The filter, I assume you' referring to, is a join filter. So, at a minimum, it needs to be applied after the hash join. I'm not sure there's a lot of benefit in pushing that filter past the project that's on top of the hash join.
-- Zelaine On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz <[email protected]> wrote: > Hi All, > > Filter is seen on top of Project in query plan for a null equality join. > This is over CSV data, shouldn't the filter appear below the project in the > query plan ? > I am on Drill 1.9.0 git commit id: a29f1e29 > > Note : t1 has some nulls in it > t2 does not have any nulls in it. > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> explain plan for > select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2 > ON t1.columns[0] = t2.columns[0] > WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0] > OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL ); > +------+------+ > | text | json | > +------+------+ > | 00-00 Screen > 00-01 ProjectAllowDup(*=[$0], *0=[$1]) > 00-02 Project(T43¦¦*=[$0], T44¦¦*=[$2]) > 00-03 SelectionVectorRemover > 00-04 Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), IS > NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =(ITEM($1, > 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS > NULL(ITEM($3, 0))))]) > 00-05 Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3], > columns0=[$4]) > 00-06 HashJoin(condition=[=($2, $5)], joinType=[inner]) > 00-07 Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2]) > 00-09 Project(T44¦¦*=[$0], columns=[$1], $f2=[ITEM($1, > 0)]) > 00-11 Project(T44¦¦*=[$0], columns=[$1]) > 00-13 Scan(groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, columns=[`*`], > files=[maprfs:///tmp/oneColWOnulls.csv]]]) > 00-08 Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1, > 0)]) > 00-10 Project(T43¦¦*=[$0], columns=[$1]) > 00-12 Scan(groupscan=[EasyGroupScan > [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1, > columns=[`*`], > files=[maprfs:///tmp/oneColDupsWnulls.csv]]]) > {noformat} > > Results returned by query > > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 JOIN > `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE t1.columns[0] > IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND > t2.columns[0] IS NULL ); > +-------------+-------------+ > | columns | columns0 | > +-------------+-------------+ > | ["test"] | ["test"] | > | ["foo"] | ["foo"] | > | ["foo"] | ["foo"] | > | ["bar"] | ["bar"] | > | ["yes"] | ["yes"] | > | ["yes"] | ["yes"] | > | ["no"] | ["no"] | > | ["no"] | ["no"] | > | ["foobar"] | ["foobar"] | > | ["foobar"] | ["foobar"] | > | ["never"] | ["never"] | > | ["never"] | ["never"] | > | ["ever"] | ["ever"] | > | ["ever"] | ["ever"] | > | ["here"] | ["here"] | > | ["there"] | ["there"] | > | ["no"] | ["no"] | > | ["no"] | ["no"] | > | ["yes"] | ["yes"] | > | ["yes"] | ["yes"] | > | ["foobar"] | ["foobar"] | > | ["foobar"] | ["foobar"] | > | ["temp"] | ["temp"] | > +-------------+-------------+ > 23 rows selected (0.341 seconds) > {noformat} > > Thanks, > Khurram >
