[
https://issues.apache.org/jira/browse/HIVE-13750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290128#comment-15290128
]
Ashutosh Chauhan commented on HIVE-13750:
-----------------------------------------
Compiler side changes look good. I left some comments on RB.
But I wonder if this breaks any assumption for FS operator about order in which
it expects rows to arrive to be written out. Since earlier all rows for a
corresponding partition in a Reducer needs to come sorted in a single batch,
but now they may come sorted but in multiple batches. [~prasanth_j] Can you
also please take a look at patch and comment?
> Avoid additional shuffle stage created by Sorted Dynamic Partition Optimizer
> when possible
> ------------------------------------------------------------------------------------------
>
> Key: HIVE-13750
> URL: https://issues.apache.org/jira/browse/HIVE-13750
> Project: Hive
> Issue Type: Improvement
> Components: Physical Optimizer
> Affects Versions: 2.1.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-13750.01.patch, HIVE-13750.02.patch,
> HIVE-13750.patch, HIVE-13750.patch
>
>
> Extend ReduceDedup to remove additional shuffle stage created by sorted
> dynamic partition optimizer when possible, thus avoiding unnecessary work.
> By [~ashutoshc]:
> {quote}
> Currently, if config is on Sorted Dynamic Partition Optimizer (SDPO)
> unconditionally adds an extra shuffle stage. If sort columns of previous
> shuffle and partitioning columns of table match, reduce sink deduplication
> optimizer removes extra shuffle stage, thus bringing down overhead to zero.
> However, if they don’t match, we end up doing extra shuffle. This can be
> improved since we can add table partition columns as a sort columns on
> earlier shuffle and avoid this extra shuffle. This ensures that in cases
> query already has a shuffle stage, we are not shuffling data again.
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)