[
https://issues.apache.org/jira/browse/IMPALA-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha resolved IMPALA-10287.
---------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Distribution strategy is sub-optimal for certain queries
> --------------------------------------------------------
>
> Key: IMPALA-10287
> URL: https://issues.apache.org/jira/browse/IMPALA-10287
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.4.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
> Fix For: Impala 4.0
>
>
> I ran a simplified query (extracted from q78 of TPC-DS) on a 600GB dataset on
> an 8 node cluster. I forced the distribution strategy for the left outer join
> and compared Broadcast vs Hash Partition for different values of mt_dop. The
> example query and results are shown below (elapsed times are in seconds):
> {noformat}
> Query (with shuffle or broadcast hint):
> select count(*)
> from store_sales
> left join [shuffle] store_returns on sr_ticket_number=ss_ticket_number
> and ss_item_sk=sr_item_sk
> join date_dim on ss_sold_date_sk = d_date_sk
> where sr_ticket_number is null
> and d_year=2002;
> {noformat}
> ||mt_dop||Broadcast||Partition||
> |1|45|15|
> |2|37|9|
> |4|33|5|
> |8|31|4|
> |12|31|4|
> Given the nearly 7.5x speedup for partition distribution at mt_dop = 12
> (which is the default), it indicates that the cost formula comparing the
> broadcast vs partition needs to be modified to take into account the mt_dop.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)