[ https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084522#comment-16084522 ]
Jinfeng Ni commented on DRILL-5468: ----------------------------------- [~amansinha100], you are right that the HAVING predicate {{having sum(l_quantity) > 300}} is the one that reduces rowcount most. Since it's uses SUM(), having NDV would not help for this HAVING predicate estimation. For tpch-sf100, the rowcount is below the default broadcast threshold (10M). Prior to Drill-4678, the rowcount on the broadcast side is 300k, which is increased to 3M after DRILL-4678. Both of them is below 10M. I think it's the relative cost comparison between broadcast vs hash exchange that causes the change of plan. > TPCH Q18 regressed ~3x due to execution plan changes > ---------------------------------------------------- > > Key: DRILL-5468 > URL: https://issues.apache.org/jira/browse/DRILL-5468 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.11.0 > Environment: 10+1 node ucs-micro cluster RHEL6.4 > Reporter: Dechang Gu > Assignee: Jinfeng Ni > Fix For: 1.11.0 > > Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363 > > > In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on > SF100 parquet dataset took ~81 secs, while the same query on 1.10.0 took only > ~27 secs. The query time on the commit adbf363 which is right before 841ead4 > is ~32 secs. > Profiles shows the plans for the query changed quite a bit (profiles will be > uploaded) -- This message was sent by Atlassian JIRA (v6.4.14#64029)