[jira] [Commented] (DRILL-5010) Equality join condition is treated as a MergeJoin and not as a HashJoin.

Zelaine Fong (JIRA) Tue, 08 Nov 2016 07:59:50 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647945#comment-15647945
 ]


Zelaine Fong commented on DRILL-5010:
-------------------------------------

I'm not sure why the use of merge join vs hash join is a bug.  Whether one is 
chosen over the other is a costing decision made by the optimizer.

> Equality join condition is treated as a MergeJoin and not as a HashJoin.
> ------------------------------------------------------------------------
>
>                 Key: DRILL-5010
>                 URL: https://issues.apache.org/jira/browse/DRILL-5010
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.9.0
>            Reporter: Khurram Faraaz
>
> Equality join condition is treated as a MergeJoin and not as a HashJoin.
> Drill 1.9.0 git commit ID: 83513daf
> Projecting the join columns results in merge join, whereas it should be doing 
> a HashJoin.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for
> . . . . . . . . . . . . . . > select t1.intKey, t2.intKey from `left.json` 
> t1, `right_all_nulls.json` t2 
> WHERE t1.intKey = t2.intKey OR ( t1.intKey IS NULL AND t2.intKey IS NULL);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(intKey=[$0], intKey0=[$1])
> 00-02        Project(intKey=[$0], intKey0=[$1])
> 00-03          MergeJoin(condition=[IS NOT DISTINCT FROM($0, $1)], 
> joinType=[inner])
> 00-05            SelectionVectorRemover
> 00-07              Sort(sort0=[$0], dir0=[ASC])
> 00-09                Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/left.json, numFiles=1, columns=[`intKey`], 
> files=[maprfs:///tmp/left.json]]])
> 00-04            Project(intKey0=[$0])
> 00-06              SelectionVectorRemover
> 00-08                Sort(sort0=[$0], dir0=[ASC])
> 00-10                  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/right_all_nulls.json, numFiles=1, 
> columns=[`intKey`], files=[maprfs:///tmp/right_all_nulls.json]]])
> {noformat}
> Note that HashAgg and HashJoin were enabled.
> {noformat}
> | planner.enable_hashagg                     | BOOLEAN  | SYSTEM  | DEFAULT  
> | null        | null        | true      | null       |
> | planner.enable_hashjoin                    | BOOLEAN  | SYSTEM  | DEFAULT  
> | null        | null        | true      | null       |
> {noformat}
> Doing a SELECT <star> results in a HashJoin in the query plan
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for
> . . . . . . . . . . . . . . > select * from `left.json` t1, 
> `right_all_nulls.json` t2 
> WHERE t1.intKey = t2.intKey OR ( t1.intKey IS NULL AND t2.intKey IS NULL);
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      ProjectAllowDup(*=[$0], *0=[$1])
> 00-02        Project(T46¦¦*=[$0], T47¦¦*=[$2])
> 00-03          HashJoin(condition=[IS NOT DISTINCT FROM($1, $3)], 
> joinType=[inner])
> 00-04            Project(T47¦¦*=[$0], intKey0=[$1])
> 00-06              Project(T47¦¦*=[$0], intKey=[$1])
> 00-08                Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/right_all_nulls.json, numFiles=1, columns=[`*`], 
> files=[maprfs:///tmp/right_all_nulls.json]]])
> 00-05            Project(T46¦¦*=[$0], intKey=[$1])
> 00-07              Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/tmp/left.json, numFiles=1, columns=[`*`], 
> files=[maprfs:///tmp/left.json]]])
> {noformat}
> Data used in above queries
> {noformat}
> [root@centos-01 null_eq_joins]# cat left.json
> {
>     "intKey" : 123,
>     "bgintKey": 1234567,
>     "strKey": "this is a test string",
>     "boolKey": true,
>     "fltKey": 123.786,
>     "dblKey": 457.984,
>     "timKey": "18:30:45",
>     "dtKey": "1997-10-21",
>     "tmstmpKey": "2007-04-30 13:10:02.047",
>     "intrvldyKey": "P9DT38833S",
>     "intrvlyrKey": "P255M"
> }
> [root@centos-01 null_equality_joins]#
> [root@centos-01 null_eq_joins]# cat right_all_nulls.json
> {
>      "intKey" : null,
>      "bgintKey": null,
>      "strKey": null,
>      "boolKey": null,
>      "fltKey": null,
>      "dblKey": null,
>      "timKey": null,
>      "dtKey": null,
>      "tmstmpKey": null,
>      "intrvldyKey": null,
>      "intrvlyrKey": null
> }
> [root@centos-01 null_eq_joins]#
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-5010) Equality join condition is treated as a MergeJoin and not as a HashJoin.

Reply via email to