Josh Rosen created SPARK-13980:
----------------------------------
Summary: Incrementally serialize blocks while unrolling them in
MemoryStore
Key: SPARK-13980
URL: https://issues.apache.org/jira/browse/SPARK-13980
Project: Spark
Issue Type: Improvement
Components: Block Manager
Reporter: Josh Rosen
Assignee: Josh Rosen
When a block is persisted in the MemoryStore at a serialized storage level, the
current MemoryStore.putIterator() code will unroll the entire iterator as Java
objects in memory, then will turn around and serialize an iterator obtained
from the unrolled array. This is inefficient and doubles our peak memory
requirements. Instead, I think that we should incrementally serialize blocks
while unrolling them. A downside to incremental serialization is the fact that
we will need to deserialize the partially-unrolled data in case there is not
enough space to unroll the block and the block cannot be dropped to disk.
However, I'm hoping that the memory efficiency improvements will outweigh any
performance losses as a result of extra serialization in that hopefully-rare
case.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]