[jira] [Commented] (DRILL-2170) Wrong result when joining to a subquery that has group-by, order-by and LIMIT

Aman Sinha (JIRA) Wed, 11 Feb 2015 15:00:34 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317164#comment-14317164
 ]


Aman Sinha commented on DRILL-2170:
-----------------------------------

[~jnadeau] could you please review ?  

Note that there is a separate issue of the Explain plan not looking 'right' 
considering that the TopN shown in Fragment 2 (see Explain plan in description) 
may run in parallel but there is no merge operation after that.   I will create 
a separate JIRA for it.

> Wrong result when joining to a subquery that has group-by, order-by and LIMIT
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-2170
>                 URL: https://issues.apache.org/jira/browse/DRILL-2170
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.7.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: 
> 0001-DRILL-2170-For-fragment-parallelization-use-max-cost.patch
>
>
> On TPCH SF1:  first query gives right result, second produces wrong result.  
> The main difference is the order-by in the second query is on the aggregation 
> function, not the grouping key.  
> {code}
> // Correct result 
> : jdbc:drill:zk=local> select count(*) from (select l_orderkey, 
> sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order 
> by 1 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
> +------------+
> |   EXPR$0   |
> +------------+
> | 100        |
> +------------+
> 1 row selected (2.67 seconds)
> // Wrong result
> 0: jdbc:drill:zk=local> select count(*) from (select l_orderkey, 
> sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order 
> by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
> +------------+
> |   EXPR$0   |
> +------------+
> | 400        |
> +------------+
> 1 row selected (3.163 seconds)
> The plan for the second query shows that there's a missing 
> SingleMergeExchange after the TopN operation.  This is needed because before 
> the TopN we did a HashToRandomExchange and since we are doing a LIMIT, we 
> need to merge the output of TopN into a single stream. 
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from (select 
> l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by 
> l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = 
> o.o_orderkey;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-02        UnionExchange
> 01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02            Project($f0=[0])
> 01-03              HashJoin(condition=[=($0, $1)], joinType=[inner])
> 01-05                HashToRandomExchange(dist0=[[$0]])
> 02-01                  Project(l_orderkey=[$0])
> 02-02                    SelectionVectorRemover
> 02-03                      Limit(fetch=[100])
> 02-04                        SelectionVectorRemover
> 02-05                          TopN(limit=[100])
> 02-06                            HashToRandomExchange(dist0=[[$2]])
> 04-01                              HashAgg(group=[{0}], EXPR$1=[SUM($1)], 
> EXPR$2=[SUM($2)])
> 04-02                                HashToRandomExchange(dist0=[[$0]])
> 05-01                                  HashAgg(group=[{0}], EXPR$1=[SUM($1)], 
> EXPR$2=[SUM($2)])
> 05-02                                    Project(l_orderkey=[$1], 
> l_quantity=[$2], l_extendedprice=[$0])
> 05-03                                      Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/tpch-sf1/lineitem]], 
> selectionRoot=/Users/asinha/data/tpch-sf1/lineitem, numFiles=1, 
> columns=[`l_orderkey`, `l_quantity`, `l_extendedprice`]]])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 03-01                  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/orders]], 
> selectionRoot=/Users/asinha/data/tpch-sf1/orders, numFiles=1, 
> columns=[`o_orderkey`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-2170) Wrong result when joining to a subquery that has group-by, order-by and LIMIT

Reply via email to