Ok, if there is not much of a performance gain as suggested in the responses, then there is not much that can be done in this case.
Thanks, Khurram On Thu, Oct 27, 2016 at 3:41 AM, Jinfeng Ni <[email protected]> wrote: > Also, if the project operator is not doing any expression evaluation, > the project operator itself would not introduce any big overhead. > There probably is no big benefit if we push filter past the project > operator, as Zelaine said. > > > On Wed, Oct 26, 2016 at 3:07 PM, Jinfeng Ni <[email protected]> wrote: > > The project under filter is for dynamic expansion of * column. Since > > the join filter is referring to columns in the project's output, it's > > not possible to push filter past that project. > > > > > > > > On Wed, Oct 26, 2016 at 1:50 PM, Zelaine Fong <[email protected]> > wrote: > >> The filter, I assume you' referring to, is a join filter. So, at a > >> minimum, it needs to be applied after the hash join. I'm not sure > there's > >> a lot of benefit in pushing that filter past the project that's on top > of > >> the hash join. > >> > >> -- Zelaine > >> > >> On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz <[email protected]> > >> wrote: > >> > >>> Hi All, > >>> > >>> Filter is seen on top of Project in query plan for a null equality > join. > >>> This is over CSV data, shouldn't the filter appear below the project > in the > >>> query plan ? > >>> I am on Drill 1.9.0 git commit id: a29f1e29 > >>> > >>> Note : t1 has some nulls in it > >>> t2 does not have any nulls in it. > >>> > >>> {noformat} > >>> 0: jdbc:drill:schema=dfs.tmp> explain plan for > >>> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2 > >>> ON t1.columns[0] = t2.columns[0] > >>> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0] > >>> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL ); > >>> +------+------+ > >>> | text | json | > >>> +------+------+ > >>> | 00-00 Screen > >>> 00-01 ProjectAllowDup(*=[$0], *0=[$1]) > >>> 00-02 Project(T43¦¦*=[$0], T44¦¦*=[$2]) > >>> 00-03 SelectionVectorRemover > >>> 00-04 Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), > IS > >>> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), > =(ITEM($1, > >>> 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS > >>> NULL(ITEM($3, 0))))]) > >>> 00-05 Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3], > >>> columns0=[$4]) > >>> 00-06 HashJoin(condition=[=($2, $5)], joinType=[inner]) > >>> 00-07 Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2]) > >>> 00-09 Project(T44¦¦*=[$0], columns=[$1], > $f2=[ITEM($1, > >>> 0)]) > >>> 00-11 Project(T44¦¦*=[$0], columns=[$1]) > >>> 00-13 Scan(groupscan=[EasyGroupScan > >>> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, > columns=[`*`], > >>> files=[maprfs:///tmp/oneColWOnulls.csv]]]) > >>> 00-08 Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1, > >>> 0)]) > >>> 00-10 Project(T43¦¦*=[$0], columns=[$1]) > >>> 00-12 Scan(groupscan=[EasyGroupScan > >>> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1, > >>> columns=[`*`], > >>> files=[maprfs:///tmp/oneColDupsWnulls.csv]]]) > >>> {noformat} > >>> > >>> Results returned by query > >>> > >>> {noformat} > >>> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 > JOIN > >>> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE > t1.columns[0] > >>> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND > >>> t2.columns[0] IS NULL ); > >>> +-------------+-------------+ > >>> | columns | columns0 | > >>> +-------------+-------------+ > >>> | ["test"] | ["test"] | > >>> | ["foo"] | ["foo"] | > >>> | ["foo"] | ["foo"] | > >>> | ["bar"] | ["bar"] | > >>> | ["yes"] | ["yes"] | > >>> | ["yes"] | ["yes"] | > >>> | ["no"] | ["no"] | > >>> | ["no"] | ["no"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["never"] | ["never"] | > >>> | ["never"] | ["never"] | > >>> | ["ever"] | ["ever"] | > >>> | ["ever"] | ["ever"] | > >>> | ["here"] | ["here"] | > >>> | ["there"] | ["there"] | > >>> | ["no"] | ["no"] | > >>> | ["no"] | ["no"] | > >>> | ["yes"] | ["yes"] | > >>> | ["yes"] | ["yes"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["foobar"] | ["foobar"] | > >>> | ["temp"] | ["temp"] | > >>> +-------------+-------------+ > >>> 23 rows selected (0.341 seconds) > >>> {noformat} > >>> > >>> Thanks, > >>> Khurram > >>> >
