[
https://issues.apache.org/jira/browse/SPARK-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410752#comment-16410752
]
Xiaoju Wu commented on SPARK-17570:
-----------------------------------
[~tejasp] When you join 3 tables with bucket number 4,8,12, if bucket join it
depends on the ordering of join. Does it mean the changes on joinReordering
rule?
> Avoid Hash and Exchange in Sort Merge join if bucketing factor is multiple
> for tables
> -------------------------------------------------------------------------------------
>
> Key: SPARK-17570
> URL: https://issues.apache.org/jira/browse/SPARK-17570
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Tejas Patil
> Priority: Minor
>
> In case of bucketed tables, Spark will avoid doing `Sort` and `Exchange` if
> the input tables and output table has same number of buckets. However,
> unequal bucketing will always lead to `Sort` and `Exchange`. If the number of
> buckets in the output table is a factor of the buckets in the input table, we
> should be able to avoid `Sort` and `Exchange` and directly join those.
> eg.
> Assume Input1, Input2 and Output be bucketed + sorted tables over the same
> columns but with different number of buckets. Input1 has 8 buckets, Input1
> has 4 buckets and Output has 4 buckets. Since hash-partitioning is done using
> Modulus, if we JOIN buckets (0, 4) of Input1 and buckets (0, 4, 8) of Input2
> in the same task, it would give the bucket 0 of output table.
> {noformat}
> Input1 (0, 4) (1, 3) (2, 5) (3, 7)
> Input2 (0, 4, 8) (1, 3, 9) (2, 5, 10) (3, 7, 11)
> Output (0) (1) (2) (3)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]