[jira] [Comment Edited] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

belvey (JIRA) Wed, 24 Apr 2019 22:58:43 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806307#comment-16806307
 ]


belvey edited comment on SPARK-13510 at 4/25/19 5:57 AM:
---------------------------------------------------------

[~shenhong] , hello hongsheng, I am using spark2.0 facing the same issue,  I am 
not sure if it's merged into spark2. it's very kind for you to post your pr.

 edit:

I found that solution had already been added to spark2.3 and later.  i am not 
sure if is hong shen's pr , but the solution is similar to what hong shen's 
said. And for spark2.3 and later we can use 
"spark.maxRemoteBlockSizeFetchToMem" to control the max block size allowed for 
shuffle fetching data that cached in memory, it's default value is 
(Interger.max-512)  bytes.


was (Author: belvey):
[~shenhong] , hello hongsheng, I am using spark2.0 facing the same issue,  I am 
not sure if it's merged into spark2. it's very kind for you to post your pr.

 edit:

I found that solution had already been added to spark2.3 and later.  i am not 
sure if is hong shen's pr , but the solution is similar to what hong shen's 
said. And for spark2.3 and later we can use 
"spark.maxRemoteBlockSizeFetchToMem" to control the max block size allowed for 
shuffle fetching data that catched in memory, it's default value is 
(Interger.max-512)  bytes.

> Shuffle may throw FetchFailedException: Direct buffer memory
> ------------------------------------------------------------
>
>                 Key: SPARK-13510
>                 URL: https://issues.apache.org/jira/browse/SPARK-13510
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0
>            Reporter: Hong Shen
>            Priority: Major
>         Attachments: spark-13510.diff
>
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>       at java.nio.Bits.reserveMemory(Bits.java:658)
>       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
>       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>       at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>       at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>       at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>       at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>       at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>       at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>       at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>       at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>       at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>       at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
>         at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
>         at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 
>   If the block is more than we can cache in memory, we  should write to disk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

Reply via email to