[ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
------------------------------
    Description: 
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:658)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
        at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
        at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
        at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
        at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
        at 
io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:744)
16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 requests 
outstanding when connection from /10.196.134.220:7337 is closed
16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
shuffle_3_81_2, and will not retry (0 retries)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
        at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
        at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  If the block is more than we can cache in memory, we  should write to disk.


  was:
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
        at java.nio.Bits.reserveMemory(Bits.java:658)
        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
        at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
        at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
        at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
        at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
        at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
        at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
        at 
io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:744)
16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 requests 
outstanding when connection from /10.196.134.220:7337 is closed
16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
shuffle_3_81_2, and will not retry (0 retries)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
        at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
        at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  If the block is more



> Shuffle may throw FetchFailedException: Direct buffer memory
> ------------------------------------------------------------
>
>                 Key: SPARK-13510
>                 URL: https://issues.apache.org/jira/browse/SPARK-13510
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0
>            Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>       at java.nio.Bits.reserveMemory(Bits.java:658)
>       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
>       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>       at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>       at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>       at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>       at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>       at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>       at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>       at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>       at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>       at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>       at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>       at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>       at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>       at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
>         at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
>         at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
>         at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 
>   If the block is more than we can cache in memory, we  should write to disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to