[
https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294638#comment-14294638
]
Aman Sinha commented on DRILL-2087:
-----------------------------------
Yes, this is a good find. For a hash exchange that is an input to a Join, we
can only remove the hash exchange on one side if the other side of the join is
a broadcast exchange. Otherwise we would get wrong result.
> Some planner option setting might produce a wrong plan and wrong query result
> for join query.
> ---------------------------------------------------------------------------------------------
>
> Key: DRILL-2087
> URL: https://issues.apache.org/jira/browse/DRILL-2087
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> I have the following parquet data ( essentially tpch's nation/region tables,
> but break into multiple parts).
> {code}
> ls -l nation
> total 24
> -rw-r--r--@ 1 jni staff 1376 Jan 27 16:47 01.parquet
> -rw-r--r--@ 1 jni staff 1400 Jan 27 16:47 02.parquet
> -rw-r--r--@ 1 jni staff 1279 Jan 27 16:47 03.parquet
> ls -l region
> total 24
> -rw-r--r--@ 1 jni staff 564 Jan 27 16:54 01.parquet
> -rw-r--r--@ 1 jni staff 491 Jan 27 16:54 02.parquet
> -rw-r--r--@ 1 jni staff 506 Jan 27 16:54 03.parquet
> {code}
> With the default planner setting, the following query produce the correct
> result :
> {code}
> 0: jdbc:drill:zk=local> select t1.n_nationkey from
> dfs.`/Users/jni/work/data/parquet/region` t2 join
> dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey =
> t2.r_regionkey;
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> +-------------+
> | n_nationkey |
> +-------------+
> | 18 |
> | 12 |
> | 9 |
> | 8 |
> | 21 |
> | 19 |
> | 7 |
> | 6 |
> | 23 |
> | 22 |
> | 20 |
> | 13 |
> | 11 |
> | 10 |
> | 4 |
> | 0 |
> | 16 |
> | 15 |
> | 14 |
> | 5 |
> | 24 |
> | 17 |
> | 3 |
> | 2 |
> | 1 |
> +-------------+
> 25 rows selected (2.057 seconds)
> {code}
> However, if I modify the planner option (`planner.slice_target` and
> `planner.join.row_count_estimate_factor` ) as following, then the same query
> would return 0 row, which is not correct.
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 10;
> +------------+------------+
> | ok | summary |
> +------------+------------+
> | true | planner.slice_target updated. |
> +------------+------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:zk=local> alter session set
> `planner.join.row_count_estimate_factor` = 0.1;
> +------------+------------+
> | ok | summary |
> +------------+------------+
> | true | planner.join.row_count_estimate_factor updated. |
> +------------+------------+
> 1 row selected (0.041 seconds)
> 0: jdbc:drill:zk=local> select t1.n_nationkey from
> dfs.`/Users/jni/work/data/parquet/region` t2 join
> dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey =
> t2.r_regionkey;
> +-------------+
> | n_nationkey |
> +-------------+
> +-------------+
> No rows selected (0.71 seconds)
> {code}
> If we look at explain plan result, after changing the planner option, we will
> see for the JOIN operator, the LEFT does not have any EXCHANGE operator,
> while the RIGHT has HashToRandomExchange operator. That seems to be not a
> valid plan.
> {code}
> 00-00 Screen
> 00-01 UnionExchange
> 01-01 Project(n_nationkey=[$2])
> 01-02 HashJoin(condition=[=($1, $0)], joinType=[inner])
> 01-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
> [path=file:/Users/jni/work/data/parquet/region]],
> selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1,
> columns=[`r_regionkey`]]])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Project(n_regionkey=[$1], n_nationkey=[$0])
> 02-02 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]],
> selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1,
> columns=[`n_regionkey`, `n_nationkey`]]])
> {code}
> The cause of this problem seems to be that Drill will remove EXCHANGE
> operator under some conditions. That condition does not guarantee that a
> JOIN operator always either has EXCHANGE on both sides, or none.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)