[
https://issues.apache.org/jira/browse/SPARK-28726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
angerszhu updated SPARK-28726:
------------------------------
Description:
When use Spark with dynamic allocation, we set idle time to 5s
We always got exception about neety 'Connect reset by peers'
I suspect that it's because we set idle time 5s is too small, it will cause
when Blockmanager call netty io, the executor has been remove because of
timeout.
But not timely notify driver's BlocakManager
{code:java}
19/08/14 00:00:46 WARN org.apache.spark.network.server.TransportChannelHandler:
"Exception in connection from /host:port"
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
--
19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint:
"Error trying to remove broadcast 67 from block manager BlockManagerId(967,
host, port, None)"
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at
io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
at
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
--
19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator
162174"
19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed to
remove shuffle 22 - Connection reset by peer"
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}
> Spark with DynamicAllocation always got connect rest by peers
> -------------------------------------------------------------
>
> Key: SPARK-28726
> URL: https://issues.apache.org/jira/browse/SPARK-28726
> Project: Spark
> Issue Type: Wish
> Components: Spark Core
> Affects Versions: 2.4.0
> Reporter: angerszhu
> Priority: Major
>
> When use Spark with dynamic allocation, we set idle time to 5s
> We always got exception about neety 'Connect reset by peers'
>
> I suspect that it's because we set idle time 5s is too small, it will cause
> when Blockmanager call netty io, the executor has been remove because of
> timeout.
> But not timely notify driver's BlocakManager
> {code:java}
> 19/08/14 00:00:46 WARN
> org.apache.spark.network.server.TransportChannelHandler: "Exception in
> connection from /host:port"
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMasterEndpoint:
> "Error trying to remove broadcast 67 from block manager BlockManagerId(967,
> host, port, None)"
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> --
> 19/08/14 00:00:46 INFO org.apache.spark.ContextCleaner: "Cleaned accumulator
> 162174"
> 19/08/14 00:00:46 WARN org.apache.spark.storage.BlockManagerMaster: "Failed
> to remove shuffle 22 - Connection reset by peer"
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39){code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]