Josh Rosen created SPARK-12757:
----------------------------------

             Summary: Use reference counting to prevent blocks from being 
evicted during reads
                 Key: SPARK-12757
                 URL: https://issues.apache.org/jira/browse/SPARK-12757
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager
            Reporter: Josh Rosen
            Assignee: Josh Rosen


As a pre-requisite to off-heap caching of blocks, we need a mechanism to 
prevent pages / blocks from being evicted while they are being read. With 
on-heap objects, evicting a block while it is being read merely leads to 
memory-accounting problems (because we assume that an evicted block is a 
candidate for garbage-collection, which will not be true during a read), but 
with off-heap memory this will lead to either data corruption or segmentation 
faults.

To address this, we should add a reference-counting mechanism to track which 
blocks/pages are being read in order to prevent them from being evicted 
prematurely. I propose to do this in two phases: first, add a safe, 
conservative approach in which all BlockManager.get*() calls implicitly 
increment the reference count of blocks and where tasks' references are 
automatically freed upon task completion. This will be correct but may have 
adverse performance impacts because it will prevent legitimate block evictions. 
In phase two, we should incrementally add release() calls in order to fix the 
eviction of unreferenced blocks. The latter change may need to touch many 
different components, which is why I propose to do it separately in order to 
make the changes easier to reason about and review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to