[ 
https://issues.apache.org/jira/browse/FLINK-19925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229211#comment-17229211
 ] 

Robert Metzger commented on FLINK-19925:
----------------------------------------

I also noticed such an exception in a CI run of the {{Local recovery and sticky 
scheduling end-to-end test}} (which, due to another bug (FLINK-19882) was not 
reported as a test failure)
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9389&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529

{code}
2020-11-10T12:19:33.7135577Z Nov 10 12:19:33 2020-11-10 12:19:26,076 WARN  
akka.remote.ReliableDeliverySupervisor                       [] - Association 
with remote system [akka.tcp://[email protected]:42025] has failed, address is now 
gated for [50] ms. Reason: [Disassociated] 
2020-11-10T12:19:33.7137278Z Nov 10 12:19:33 2020-11-10 12:19:26,078 WARN  
akka.remote.ReliableDeliverySupervisor                       [] - Association 
with remote system [akka.tcp://[email protected]:39283] has failed, 
address is now gated for [50] ms. Reason: [Disassociated] 
2020-11-10T12:19:33.7138935Z Nov 10 12:19:33 2020-11-10 12:19:26,305 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Flat Map -> 
Sink: Unnamed (2/4) (3bd0efab238877e710563ad4f49f87a0) switched from RUNNING to 
FAILED on 
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@7d7cbe96.
2020-11-10T12:19:33.7140369Z Nov 10 12:19:33 
org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: 
readAddress(..) failed: Connection reset by peer (connection to 
'10.1.0.4/10.1.0.4:43007')
2020-11-10T12:19:33.7141721Z Nov 10 12:19:33    at 
org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:173)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7143118Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7144513Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7145905Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7147250Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7148622Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7149987Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7151331Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7152923Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:728)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7154494Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:818)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7155806Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7157051Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7158290Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7160165Z Nov 10 12:19:33    at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-11-10T12:19:33.7160852Z Nov 10 12:19:33    at 
java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_272]
2020-11-10T12:19:33.7161480Z Nov 10 12:19:33 Caused by: 
org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: 
readAddress(..) failed: Connection reset by peer
2020-11-10T12:19:33.7162958Z Nov 10 12:19:33 2020-11-10 12:19:26,366 DEBUG 
org.apache.flink.runtime.scheduler.SharedSlot                [] - Remove 
logical slot (SlotRequestId{8e34fc6893cd64b17fb0125192c96e18}) for execution 
vertex (id 20ba6b65f97481d5570070de90e4e791_1) from the physical slot 
(SlotRequestId{ac67d850d673c4ceb476a5fb5cd40cba})
{code}

> Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
> --------------------------------------------------------------------------
>
>                 Key: FLINK-19925
>                 URL: https://issues.apache.org/jira/browse/FLINK-19925
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>    Affects Versions: 1.12.0
>            Reporter: godfrey he
>            Priority: Major
>             Fix For: 1.12.0
>
>
> Errors$NativeIoException will occur sometime when we run TPCDS based on 
> master, the full exception stack is 
> {code:java}
> Caused by: 
> org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: 
> readAddress(..) failed: Connection reset by peer (connection to 'xxx')
>       at 
> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:173)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:728)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:818)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>       at java.lang.Thread.run(Thread.java:834) ~[?:1.8.0_102]
> Caused by: 
> org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException:
>  readAddress(..) failed: Connection reset by peer
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to