[jira] [Commented] (SPARK-22276) Unnecessary repartitioning

Liang-Chi Hsieh (JIRA) Sun, 15 Oct 2017 20:22:25 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-22276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205400#comment-16205400
 ]


Liang-Chi Hsieh commented on SPARK-22276:
-----------------------------------------

Can you provide an simple example to reproduce this issue? Then it is easier to 
verify the problem.

> Unnecessary repartitioning
> --------------------------
>
>                 Key: SPARK-22276
>                 URL: https://issues.apache.org/jira/browse/SPARK-22276
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer
>    Affects Versions: 2.2.0
>            Reporter: Fernando Pereira
>
> When a dataframe is sorted it is partitioned with a RangePartitioner.
> If later we aggregate by the exact same fields over which sort was applied 
> there is a new (apparently useless) Exchange repartitioning by a 
> HashPartitioner.
> In my use case the groupBy exchange is still very costly as the aggregate 
> function won't reduce the data volume.
> Is there any reason why groupBy always shuffles data, or could this be 
> improved? 
> Is there currently a way to workaround for the moment, without going to 
> mapPartitions?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22276) Unnecessary repartitioning

Reply via email to