Also, if the project operator is not doing any expression evaluation, the project operator itself would not introduce any big overhead. There probably is no big benefit if we push filter past the project operator, as Zelaine said.
On Wed, Oct 26, 2016 at 3:07 PM, Jinfeng Ni <[email protected]> wrote: > The project under filter is for dynamic expansion of * column. Since > the join filter is referring to columns in the project's output, it's > not possible to push filter past that project. > > > > On Wed, Oct 26, 2016 at 1:50 PM, Zelaine Fong <[email protected]> wrote: >> The filter, I assume you' referring to, is a join filter. So, at a >> minimum, it needs to be applied after the hash join. I'm not sure there's >> a lot of benefit in pushing that filter past the project that's on top of >> the hash join. >> >> -- Zelaine >> >> On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz <[email protected]> >> wrote: >> >>> Hi All, >>> >>> Filter is seen on top of Project in query plan for a null equality join. >>> This is over CSV data, shouldn't the filter appear below the project in the >>> query plan ? >>> I am on Drill 1.9.0 git commit id: a29f1e29 >>> >>> Note : t1 has some nulls in it >>> t2 does not have any nulls in it. >>> >>> {noformat} >>> 0: jdbc:drill:schema=dfs.tmp> explain plan for >>> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2 >>> ON t1.columns[0] = t2.columns[0] >>> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0] >>> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL ); >>> +------+------+ >>> | text | json | >>> +------+------+ >>> | 00-00 Screen >>> 00-01 ProjectAllowDup(*=[$0], *0=[$1]) >>> 00-02 Project(T43¦¦*=[$0], T44¦¦*=[$2]) >>> 00-03 SelectionVectorRemover >>> 00-04 Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), IS >>> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =(ITEM($1, >>> 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS >>> NULL(ITEM($3, 0))))]) >>> 00-05 Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3], >>> columns0=[$4]) >>> 00-06 HashJoin(condition=[=($2, $5)], joinType=[inner]) >>> 00-07 Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2]) >>> 00-09 Project(T44¦¦*=[$0], columns=[$1], $f2=[ITEM($1, >>> 0)]) >>> 00-11 Project(T44¦¦*=[$0], columns=[$1]) >>> 00-13 Scan(groupscan=[EasyGroupScan >>> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, columns=[`*`], >>> files=[maprfs:///tmp/oneColWOnulls.csv]]]) >>> 00-08 Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1, >>> 0)]) >>> 00-10 Project(T43¦¦*=[$0], columns=[$1]) >>> 00-12 Scan(groupscan=[EasyGroupScan >>> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1, >>> columns=[`*`], >>> files=[maprfs:///tmp/oneColDupsWnulls.csv]]]) >>> {noformat} >>> >>> Results returned by query >>> >>> {noformat} >>> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 JOIN >>> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE t1.columns[0] >>> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND >>> t2.columns[0] IS NULL ); >>> +-------------+-------------+ >>> | columns | columns0 | >>> +-------------+-------------+ >>> | ["test"] | ["test"] | >>> | ["foo"] | ["foo"] | >>> | ["foo"] | ["foo"] | >>> | ["bar"] | ["bar"] | >>> | ["yes"] | ["yes"] | >>> | ["yes"] | ["yes"] | >>> | ["no"] | ["no"] | >>> | ["no"] | ["no"] | >>> | ["foobar"] | ["foobar"] | >>> | ["foobar"] | ["foobar"] | >>> | ["never"] | ["never"] | >>> | ["never"] | ["never"] | >>> | ["ever"] | ["ever"] | >>> | ["ever"] | ["ever"] | >>> | ["here"] | ["here"] | >>> | ["there"] | ["there"] | >>> | ["no"] | ["no"] | >>> | ["no"] | ["no"] | >>> | ["yes"] | ["yes"] | >>> | ["yes"] | ["yes"] | >>> | ["foobar"] | ["foobar"] | >>> | ["foobar"] | ["foobar"] | >>> | ["temp"] | ["temp"] | >>> +-------------+-------------+ >>> 23 rows selected (0.341 seconds) >>> {noformat} >>> >>> Thanks, >>> Khurram >>>
