Josh Rosen created SPARK-13980:
----------------------------------

             Summary: Incrementally serialize blocks while unrolling them in 
MemoryStore
                 Key: SPARK-13980
                 URL: https://issues.apache.org/jira/browse/SPARK-13980
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager
            Reporter: Josh Rosen
            Assignee: Josh Rosen


When a block is persisted in the MemoryStore at a serialized storage level, the 
current MemoryStore.putIterator() code will unroll the entire iterator as Java 
objects in memory, then will turn around and serialize an iterator obtained 
from the unrolled array. This is inefficient and doubles our peak memory 
requirements. Instead, I think that we should incrementally serialize blocks 
while unrolling them. A downside to incremental serialization is the fact that 
we will need to deserialize the partially-unrolled data in case there is not 
enough space to unroll the block and the block cannot be dropped to disk. 
However, I'm hoping that the memory efficiency improvements will outweigh any 
performance losses as a result of extra serialization in that hopefully-rare 
case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to