[jira] [Comment Edited] (FLINK-18129) Unhandled exception stack trace from DispatcherRestEndpoint when deploying Kubernetes session cluster

Chesnay Schepler (Jira) Fri, 19 Jun 2020 06:39:15 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140556#comment-17140556
 ]


Chesnay Schepler edited comment on FLINK-18129 at 6/19/20, 1:38 PM:
--------------------------------------------------------------------

Well yes, but no. You can't guarantee that this only happens because of the 
LoadBalancer, and that there isn't some other instance where this could an 
error you want to be visible, be it a client error or network outage.


was (Author: zentol):
Well yes, but no. You can't guarantee that this only happens because of the 
LoadBalancer, and that there isn't some other instance where this could an 
error, be it a client error or network outage.

> Unhandled exception stack trace from DispatcherRestEndpoint when deploying 
> Kubernetes session cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18129
>                 URL: https://issues.apache.org/jira/browse/FLINK-18129
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.11.0
>            Reporter: Till Rohrmann
>            Priority: Major
>             Fix For: 1.11.0
>
>
> When deploying a session cluster on Kubernetes via 
> {{bin/kubernetes-session.sh}}, I see the following stack trace in the master 
> logs:
> {code}
> 2020-06-04 01:17:52,068 WARN  
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Unhandled 
> exception
> java.io.IOException: Connection reset by peer
>       at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_252]
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
> ~[?:1.8.0_252]
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
> ~[?:1.8.0_252]
>       at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_252]
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) 
> ~[?:1.8.0_252]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247)
>  ~[flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140)
>  ~[flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
>  ~[flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
> {code}
> I am not entirely sure whether this is a configuration problem or a K8s 
> service which does some liveness checks? The consequence is that the JM logs 
> are being cluttered with these stack traces.
> Most likely this is not caused by Flink but some K8s behavior. The question 
> is whether we can do something about it if it occurs often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-18129) Unhandled exception stack trace from DispatcherRestEndpoint when deploying Kubernetes session cluster

Reply via email to