Andrew Or created SPARK-1762:
--------------------------------

             Summary: Add functionality to pin RDDs in cache
                 Key: SPARK-1762
                 URL: https://issues.apache.org/jira/browse/SPARK-1762
             Project: Spark
          Issue Type: Improvement
    Affects Versions: 1.0.0
            Reporter: Andrew Or
             Fix For: 1.1.0


Right now, all RDDs are created equal, and there is no mechanism to identify a 
certain RDD to be more important than the rest. This is a problem if the RDD 
fraction is small, because just caching a few RDDs can evict more important 
ones.

A side effect of this feature is that we can now more safely allocate a smaller 
spark.storage.memoryFraction if we know how large our important RDDs are, 
without having to worry about them being evicted. This allows us to use more 
memory for shuffles, for instance, and avoid disk spills.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to