phaniarnab opened a new pull request, #1834:
URL: https://github.com/apache/systemds/pull/1834

   This patch extends the lineage cache eviction policies to support RDDs 
persisted at the executors.
   - We checkpoint a RDD on the second cache hit (reduce cache pollution).
   - While checkpointing, we rely on the worst case size estimations and later 
update the eviction data structures with actual size once the RDDs are 
persisted.
   - We split the Spark operators into two groups, one for expensive, 
shuffle-based operations, and another for map-based operations. For the scoring 
function, we assume the first set is 2x more expensive
   - We also track the reference counts of RDDs and use that in the scoring. 
More references (many consumers) indicates higher importance.
   - We reduce the score by one hit count if we collect a persisted RDD. This 
is to evict the intermediates which are cached at multiple locations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@systemds.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to