[
https://issues.apache.org/jira/browse/DRILL-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084488#comment-16084488
]
Aman Sinha commented on DRILL-5468:
-----------------------------------
Agree that the new logical plan with the redundant aggregate removed is the
right plan and previously the application of the 10% (arbitrary) reduction
factor for each of the 2 aggrs was causing the broadcast plan to be chosen.
Note that even with NDV statistics, the query 18 was not getting a broadcast
plan, as [~gparai] found when working on DRILL-1328 after creating stats on the
Parquet table. The reason is the group-by is on l_orderkey which has a very
high NDV. With higher scale factors, the new plan will likely perform better
due to hash distribution.
For the smaller scale factor, a workaround would be to raise the planner's
broadcast threshold for this particular query.
> THCH Q18 regressed ~3x due to execution plan changes
> ----------------------------------------------------
>
> Key: DRILL-5468
> URL: https://issues.apache.org/jira/browse/DRILL-5468
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.11.0
> Environment: 10+1 node ucs-micro cluster RHEL6.4
> Reporter: Dechang Gu
> Assignee: Jinfeng Ni
> Fix For: 1.11.0
>
> Attachments: Q18_profile_gitid_841ead4, Q18_profile_gitid_adbf363
>
>
> In a regular regression test on Drill master (commit id 841ead4) TPCH Q18 on
> SF100 parquet dataset took ~81 secs, while the same query on 1.10.0 took only
> ~27 secs. The query time on the commit adbf363 which is right before 841ead4
> is ~32 secs.
> Profiles shows the plans for the query changed quite a bit (profiles will be
> uploaded)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)