Davies Liu created SPARK-2538:
---------------------------------

             Summary: External aggregation in Python
                 Key: SPARK-2538
                 URL: https://issues.apache.org/jira/browse/SPARK-2538
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 1.0.0, 1.0.1
            Reporter: Davies Liu
             Fix For: 1.0.1, 1.0.0


For huge reduce tasks, user will got out of memory exception when all the data 
can not fit in memory.

It should put some of the data into disks and then merge them together, just 
like what we do in Scala. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to