[ 
https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084488#comment-16084488
 ] 

Aman Sinha commented on DRILL-5468:
-----------------------------------

Agree that the new logical plan with the redundant aggregate removed is the 
right plan and previously the application of the 10% (arbitrary) reduction 
factor  for each of the 2 aggrs was causing the broadcast plan to be chosen.  
Note that even with NDV statistics,  the query 18 was not getting a broadcast 
plan, as [~gparai] found when working on DRILL-1328 after creating stats on the 
Parquet table.  The reason is the group-by is on l_orderkey which has a very 
high NDV.   With higher scale factors, the new plan will likely perform better 
due to hash distribution. 
For the smaller scale factor, a workaround would be to raise the planner's 
broadcast threshold for this particular query.  

> THCH Q18 regressed ~3x due to execution plan changes
> ----------------------------------------------------
>
>                 Key: DRILL-5468
>                 URL: https://issues.apache.org/jira/browse/DRILL-5468
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>         Environment: 10+1 node ucs-micro cluster RHEL6.4
>            Reporter: Dechang Gu
>            Assignee: Jinfeng Ni
>             Fix For: 1.11.0
>
>         Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363
>
>
> In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on 
> SF100 parquet dataset took ~81 secs, while the same query on 1.10.0 took only 
> ~27 secs.  The query time on the commit adbf363 which is right before 841ead4 
> is ~32 secs.
> Profiles shows the plans for the query changed quite a bit (profiles will be 
> uploaded) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to