Vincent created SPARK-24917:
-------------------------------

             Summary: Sending a partition over netty results in 2x memory usage
                 Key: SPARK-24917
                 URL: https://issues.apache.org/jira/browse/SPARK-24917
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.2.2
            Reporter: Vincent


Hello

while investigating some OOM errors in Spark 2.2 [(here's my call 
stack)|https://image.ibb.co/hHa2R8/sparkOOM.png], I find the following behavior 
happening, which I think is weird:
 * a request happens to send a partition over network
 * this partition is 1.9 GB and is persisted in memory
 * this partition is apparently stored in a ByteBufferBlockData, that is made 
of a ChunkedByteBuffer, which is a list of (lots of) ByteBuffer of 4 MB each.
 * the call to toNetty() is supposed to only wrap all the arrays and not 
allocate any memory
 * yet the call stack shows that netty is allocating memory and is trying to 
consolidate all the chunks into one big 1.9GB array
 * this means that at this point the memory footprint is 2x the size of the 
actual partition (which is huge when the partition is 1.9GB)

Is this transient allocation expected?

After digging, it turns out that the actual copy is due to [this 
method|https://github.com/netty/netty/blob/4.0/buffer/src/main/java/io/netty/buffer/Unpooled.java#L260]
 in netty. If my initial buffer is made of more than DEFAULT_MAX_COMPONENTS 
(16) components it will trigger a re-allocation of all the buffer. This netty 
issue was fixed in this recent change : 
[https://github.com/netty/netty/commit/9b95b8ee628983e3e4434da93fffb893edff4aa2]

 

As a result, is it possible to benefit from this change somehow in spark 2.2 
and above? I don't know how the netty dependencies are handled for spark

 

Thanks!

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to