[
https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084522#comment-16084522
]
Jinfeng Ni commented on DRILL-5468:
-----------------------------------
[~amansinha100], you are right that the HAVING predicate {{having
sum(l_quantity) > 300}} is the one that reduces rowcount most. Since it's uses
SUM(), having NDV would not help for this HAVING predicate estimation.
For tpch-sf100, the rowcount is below the default broadcast threshold (10M).
Prior to Drill-4678, the rowcount on the broadcast side is 300k, which is
increased to 3M after DRILL-4678. Both of them is below 10M. I think it's the
relative cost comparison between broadcast vs hash exchange that causes the
change of plan.
> TPCH Q18 regressed ~3x due to execution plan changes
> ----------------------------------------------------
>
> Key: DRILL-5468
> URL: https://issues.apache.org/jira/browse/DRILL-5468
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.11.0
> Environment: 10+1 node ucs-micro cluster RHEL6.4
> Reporter: Dechang Gu
> Assignee: Jinfeng Ni
> Fix For: 1.11.0
>
> Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363
>
>
> In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on
> SF100 parquet dataset took ~81 secs, while the same query on 1.10.0 took only
> ~27 secs. The query time on the commit adbf363 which is right before 841ead4
> is ~32 secs.
> Profiles shows the plans for the query changed quite a bit (profiles will be
> uploaded)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)