[
https://issues.apache.org/jira/browse/SPARK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077075#comment-14077075
]
Davies Liu commented on SPARK-2023:
-----------------------------------
In most cases, the result of reduce will be small, so collect these small data
from each partition then reduce them will not be bottleneck.
> PySpark reduce does a map side reduce and then sends the results to the
> driver for final reduce, instead do this more like Scala Spark.
> ---------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-2023
> URL: https://issues.apache.org/jira/browse/SPARK-2023
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Reporter: holdenk
>
> PySpark reduce does a map side reduce and then sends the results to the
> driver for final reduce, instead do this more like Scala Spark. The current
> implementation could be a bottleneck.
--
This message was sent by Atlassian JIRA
(v6.2#6252)