[ 
https://issues.apache.org/jira/browse/DRILL-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294638#comment-14294638
 ] 

Aman Sinha commented on DRILL-2087:
-----------------------------------

Yes, this is a good find.  For a hash exchange that is an input to a Join, we 
can only remove the hash exchange on one side if the other side of the join is 
a broadcast exchange.  Otherwise we would get wrong result. 

> Some planner option setting might produce a wrong plan and wrong query result 
> for join query.
> ---------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2087
>                 URL: https://issues.apache.org/jira/browse/DRILL-2087
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> I have the following parquet data ( essentially tpch's nation/region tables, 
> but break into multiple parts).
> {code}
> ls -l nation
> total 24
> -rw-r--r--@ 1 jni  staff  1376 Jan 27 16:47 01.parquet
> -rw-r--r--@ 1 jni  staff  1400 Jan 27 16:47 02.parquet
> -rw-r--r--@ 1 jni  staff  1279 Jan 27 16:47 03.parquet
> ls -l region
> total 24
> -rw-r--r--@ 1 jni  staff  564 Jan 27 16:54 01.parquet
> -rw-r--r--@ 1 jni  staff  491 Jan 27 16:54 02.parquet
> -rw-r--r--@ 1 jni  staff  506 Jan 27 16:54 03.parquet
> {code}
> With the default planner setting, the following query produce the correct 
> result :
> {code}
> 0: jdbc:drill:zk=local> select t1.n_nationkey from 
> dfs.`/Users/jni/work/data/parquet/region` t2 join 
> dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
> t2.r_regionkey;
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> +-------------+
> | n_nationkey |
> +-------------+
> | 18          |
> | 12          |
> | 9           |
> | 8           |
> | 21          |
> | 19          |
> | 7           |
> | 6           |
> | 23          |
> | 22          |
> | 20          |
> | 13          |
> | 11          |
> | 10          |
> | 4           |
> | 0           |
> | 16          |
> | 15          |
> | 14          |
> | 5           |
> | 24          |
> | 17          |
> | 3           |
> | 2           |
> | 1           |
> +-------------+
> 25 rows selected (2.057 seconds)
> {code}
> However, if I modify the planner option (`planner.slice_target` and 
> `planner.join.row_count_estimate_factor`  ) as following, then the same query 
> would return 0 row, which is not correct. 
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 10;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | planner.slice_target updated. |
> +------------+------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:zk=local> alter session set 
> `planner.join.row_count_estimate_factor` = 0.1;
> +------------+------------+
> |     ok     |  summary   |
> +------------+------------+
> | true       | planner.join.row_count_estimate_factor updated. |
> +------------+------------+
> 1 row selected (0.041 seconds)
> 0: jdbc:drill:zk=local> select t1.n_nationkey from 
> dfs.`/Users/jni/work/data/parquet/region` t2 join 
> dfs.`/Users/jni/work/data/parquet/nation` t1 on t1.n_regionkey = 
> t2.r_regionkey;
> +-------------+
> | n_nationkey |
> +-------------+
> +-------------+
> No rows selected (0.71 seconds)
> {code}
> If we look at explain plan result, after changing the planner option, we will 
> see for the JOIN operator, the LEFT does not have any EXCHANGE operator, 
> while the RIGHT has HashToRandomExchange operator.  That seems to be not a 
> valid plan.
> {code}
> 00-00    Screen
> 00-01      UnionExchange
> 01-01        Project(n_nationkey=[$2])
> 01-02          HashJoin(condition=[=($1, $0)], joinType=[inner])
> 01-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=file:/Users/jni/work/data/parquet/region]], 
> selectionRoot=/Users/jni/work/data/parquet/region, numFiles=1, 
> columns=[`r_regionkey`]]])
> 01-03            HashToRandomExchange(dist0=[[$0]])
> 02-01              Project(n_regionkey=[$1], n_nationkey=[$0])
> 02-02                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/jni/work/data/parquet/nation]], 
> selectionRoot=/Users/jni/work/data/parquet/nation, numFiles=1, 
> columns=[`n_regionkey`, `n_nationkey`]]])
> {code} 
> The cause of this problem seems to be that Drill will  remove EXCHANGE 
> operator under some conditions.  That condition does not guarantee that a 
> JOIN operator always either has EXCHANGE on both sides, or none. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to