Andrew Or created SPARK-1762:
--------------------------------
Summary: Add functionality to pin RDDs in cache
Key: SPARK-1762
URL: https://issues.apache.org/jira/browse/SPARK-1762
Project: Spark
Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Andrew Or
Fix For: 1.1.0
Right now, all RDDs are created equal, and there is no mechanism to identify a
certain RDD to be more important than the rest. This is a problem if the RDD
fraction is small, because just caching a few RDDs can evict more important
ones.
A side effect of this feature is that we can now more safely allocate a smaller
spark.storage.memoryFraction if we know how large our important RDDs are,
without having to worry about them being evicted. This allows us to use more
memory for shuffles, for instance, and avoid disk spills.
--
This message was sent by Atlassian JIRA
(v6.2#6252)