The project under filter is for dynamic expansion of * column. Since
the join filter is referring to columns in the project's output, it's
not possible to push filter past that project.



On Wed, Oct 26, 2016 at 1:50 PM, Zelaine Fong <[email protected]> wrote:
> The filter, I assume you' referring to, is a join filter.  So, at a
> minimum, it needs to be applied after the hash join.  I'm not sure there's
> a lot of benefit in pushing that filter past the project that's on top of
> the hash join.
>
> -- Zelaine
>
> On Wed, Oct 26, 2016 at 8:59 AM, Khurram Faraaz <[email protected]>
> wrote:
>
>> Hi All,
>>
>> Filter is seen on top of Project in query plan for a null equality join.
>> This is over CSV data, shouldn't the filter appear below the project in the
>> query plan ?
>> I am on Drill 1.9.0 git commit id: a29f1e29
>>
>> Note : t1 has some nulls in it
>>            t2 does not have any nulls in it.
>>
>> {noformat}
>> 0: jdbc:drill:schema=dfs.tmp> explain plan for
>> select * from `oneColDupsWnulls.csv` t1 JOIN `oneColWOnulls.csv` t2
>> ON t1.columns[0] = t2.columns[0]
>> WHERE t1.columns[0] IS NOT DISTINCT FROM t2.columns[0]
>> OR ( t1.columns[0] IS NULL AND t2.columns[0] IS NULL );
>> +------+------+
>> | text | json |
>> +------+------+
>> | 00-00    Screen
>> 00-01      ProjectAllowDup(*=[$0], *0=[$1])
>> 00-02        Project(T43¦¦*=[$0], T44¦¦*=[$2])
>> 00-03          SelectionVectorRemover
>> 00-04            Filter(condition=[OR(CAST(CASE(IS NULL(ITEM($1, 0)), IS
>> NULL(ITEM($3, 0)), IS NULL(ITEM($3, 0)), IS NULL(ITEM($1, 0)), =(ITEM($1,
>> 0), ITEM($3, 0)))):BOOLEAN NOT NULL, AND(IS NULL(ITEM($1, 0)), IS
>> NULL(ITEM($3, 0))))])
>> 00-05              Project(T43¦¦*=[$0], columns=[$1], T44¦¦*=[$3],
>> columns0=[$4])
>> 00-06                HashJoin(condition=[=($2, $5)], joinType=[inner])
>> 00-07                  Project(T44¦¦*=[$0], columns0=[$1], $f20=[$2])
>> 00-09                    Project(T44¦¦*=[$0], columns=[$1], $f2=[ITEM($1,
>> 0)])
>> 00-11                      Project(T44¦¦*=[$0], columns=[$1])
>> 00-13                        Scan(groupscan=[EasyGroupScan
>> [selectionRoot=maprfs:/tmp/oneColWOnulls.csv, numFiles=1, columns=[`*`],
>> files=[maprfs:///tmp/oneColWOnulls.csv]]])
>> 00-08                  Project(T43¦¦*=[$0], columns=[$1], $f2=[ITEM($1,
>> 0)])
>> 00-10                    Project(T43¦¦*=[$0], columns=[$1])
>> 00-12                      Scan(groupscan=[EasyGroupScan
>> [selectionRoot=maprfs:/tmp/oneColDupsWnulls.csv, numFiles=1,
>> columns=[`*`],
>> files=[maprfs:///tmp/oneColDupsWnulls.csv]]])
>> {noformat}
>>
>> Results returned by query
>>
>> {noformat}
>> 0: jdbc:drill:schema=dfs.tmp> select * from `oneColDupsWnulls.csv` t1 JOIN
>> `oneColWOnulls.csv` t2 ON t1.columns[0] = t2.columns[0] WHERE t1.columns[0]
>> IS NOT DISTINCT FROM t2.columns[0] OR ( t1.columns[0] IS NULL AND
>> t2.columns[0] IS NULL );
>> +-------------+-------------+
>> |   columns   |  columns0   |
>> +-------------+-------------+
>> | ["test"]    | ["test"]    |
>> | ["foo"]     | ["foo"]     |
>> | ["foo"]     | ["foo"]     |
>> | ["bar"]     | ["bar"]     |
>> | ["yes"]     | ["yes"]     |
>> | ["yes"]     | ["yes"]     |
>> | ["no"]      | ["no"]      |
>> | ["no"]      | ["no"]      |
>> | ["foobar"]  | ["foobar"]  |
>> | ["foobar"]  | ["foobar"]  |
>> | ["never"]   | ["never"]   |
>> | ["never"]   | ["never"]   |
>> | ["ever"]    | ["ever"]    |
>> | ["ever"]    | ["ever"]    |
>> | ["here"]    | ["here"]    |
>> | ["there"]   | ["there"]   |
>> | ["no"]      | ["no"]      |
>> | ["no"]      | ["no"]      |
>> | ["yes"]     | ["yes"]     |
>> | ["yes"]     | ["yes"]     |
>> | ["foobar"]  | ["foobar"]  |
>> | ["foobar"]  | ["foobar"]  |
>> | ["temp"]    | ["temp"]    |
>> +-------------+-------------+
>> 23 rows selected (0.341 seconds)
>> {noformat}
>>
>> Thanks,
>> Khurram
>>

Reply via email to