[ https://issues.apache.org/jira/browse/SPARK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077075#comment-14077075 ]
Davies Liu commented on SPARK-2023: ----------------------------------- In most cases, the result of reduce will be small, so collect these small data from each partition then reduce them will not be bottleneck. > PySpark reduce does a map side reduce and then sends the results to the > driver for final reduce, instead do this more like Scala Spark. > --------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-2023 > URL: https://issues.apache.org/jira/browse/SPARK-2023 > Project: Spark > Issue Type: Improvement > Components: PySpark > Reporter: holdenk > > PySpark reduce does a map side reduce and then sends the results to the > driver for final reduce, instead do this more like Scala Spark. The current > implementation could be a bottleneck. -- This message was sent by Atlassian JIRA (v6.2#6252)