[jira] [Commented] (DRILL-5468) TPCH Q18 regressed ~3x due to execution plan changes

Jinfeng Ni (JIRA) Wed, 12 Jul 2017 12:30:32 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084522#comment-16084522
 ]


Jinfeng Ni commented on DRILL-5468:
-----------------------------------

[~amansinha100], you are right that the HAVING predicate {{having 
sum(l_quantity) > 300}} is the one that reduces rowcount most.  Since it's uses 
SUM(), having NDV would not help for this HAVING predicate estimation. 

For tpch-sf100, the rowcount is below the default broadcast threshold (10M).  
Prior to Drill-4678, the rowcount on the broadcast side is 300k, which is 
increased to 3M after DRILL-4678. Both of them is below 10M.  I think it's the 
relative cost comparison between broadcast vs hash exchange that causes the 
change of plan. 



> TPCH Q18 regressed ~3x due to execution plan changes
> ----------------------------------------------------
>
>                 Key: DRILL-5468
>                 URL: https://issues.apache.org/jira/browse/DRILL-5468
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>         Environment: 10+1 node ucs-micro cluster RHEL6.4
>            Reporter: Dechang Gu
>            Assignee: Jinfeng Ni
>             Fix For: 1.11.0
>
>         Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363
>
>
> In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on 
> SF100 parquet dataset took ~81 secs, while the same query on 1.10.0 took only 
> ~27 secs.  The query time on the commit adbf363 which is right before 841ead4 
> is ~32 secs.
> Profiles shows the plans for the query changed quite a bit (profiles will be 
> uploaded) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5468) TPCH Q18 regressed ~3x due to execution plan changes

Reply via email to