The project under filter is for dynamic expansion of * column. Since the join filter is referring to columns in the project's output, it's not possible to push filter past that project.
On Wed, Oct 26, 2016 at 1:50 PM, Zelaine Fong <[email protected]> wrote: > The filter, I assume you' referring to, is a join filter. So, at a > minimum, it needs to be applied after the hash join. I'm not sure there's > a lot of benefit in pushing that filter past the project that's on top of > the hash join. > > -- Zelaine > > On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz <[email protected]> > wrote: > >> Hi All, >> >> Filter is seen on top of Project in query plan for a null equality join. >> This is over CSV data, shouldn't the filter appear below the project in the >> query plan ? >> I am on Drill 1.9.0 git commit id: a29f1e29 >> >> Note : t1 has some nulls in it >> t2 does not have any nulls in it. >> >> {noformat} >> 0: jdbc:drill:schema=dfs.tmp> explain plan for >> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2 >> ON t1.columns[0] = t2.columns[0] >> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0] >> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL ); >> +------+------+ >> | text | json | >> +------+------+ >> | 00-00 Screen >> 00-01 ProjectAllowDup(*=[$0], *0=[$1]) >> 00-02 Project(T43¦¦*=[$0], T44¦¦*=[$2]) >> 00-03 SelectionVectorRemover >> 00-04 Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), IS >> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =(ITEM($1, >> 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS >> NULL(ITEM($3, 0))))]) >> 00-05 Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3], >> columns0=[$4]) >> 00-06 HashJoin(condition=[=($2, $5)], joinType=[inner]) >> 00-07 Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2]) >> 00-09 Project(T44¦¦*=[$0], columns=[$1], $f2=[ITEM($1, >> 0)]) >> 00-11 Project(T44¦¦*=[$0], columns=[$1]) >> 00-13 Scan(groupscan=[EasyGroupScan >> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, columns=[`*`], >> files=[maprfs:///tmp/oneColWOnulls.csv]]]) >> 00-08 Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1, >> 0)]) >> 00-10 Project(T43¦¦*=[$0], columns=[$1]) >> 00-12 Scan(groupscan=[EasyGroupScan >> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1, >> columns=[`*`], >> files=[maprfs:///tmp/oneColDupsWnulls.csv]]]) >> {noformat} >> >> Results returned by query >> >> {noformat} >> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 JOIN >> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE t1.columns[0] >> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND >> t2.columns[0] IS NULL ); >> +-------------+-------------+ >> | columns | columns0 | >> +-------------+-------------+ >> | ["test"] | ["test"] | >> | ["foo"] | ["foo"] | >> | ["foo"] | ["foo"] | >> | ["bar"] | ["bar"] | >> | ["yes"] | ["yes"] | >> | ["yes"] | ["yes"] | >> | ["no"] | ["no"] | >> | ["no"] | ["no"] | >> | ["foobar"] | ["foobar"] | >> | ["foobar"] | ["foobar"] | >> | ["never"] | ["never"] | >> | ["never"] | ["never"] | >> | ["ever"] | ["ever"] | >> | ["ever"] | ["ever"] | >> | ["here"] | ["here"] | >> | ["there"] | ["there"] | >> | ["no"] | ["no"] | >> | ["no"] | ["no"] | >> | ["yes"] | ["yes"] | >> | ["yes"] | ["yes"] | >> | ["foobar"] | ["foobar"] | >> | ["foobar"] | ["foobar"] | >> | ["temp"] | ["temp"] | >> +-------------+-------------+ >> 23 rows selected (0.341 seconds) >> {noformat} >> >> Thanks, >> Khurram >>
