[ 
https://issues.apache.org/jira/browse/FLINK-18129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-18129:
----------------------------------
    Description: 
When deploying a session cluster on Kubernetes via 
{{bin/kubernetes-session.sh}}, I see the following stack trace in the master 
logs:

{code}
2020-06-04 01:17:52,068 WARN  
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Unhandled 
exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_252]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[?:1.8.0_252]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[?:1.8.0_252]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_252]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) 
~[?:1.8.0_252]
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247)
 ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140)
 ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
 ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
{code}

I am not entirely sure whether this is a configuration problem or a K8s service 
which does some liveness checks? The consequence is that the JM logs are being 
cluttered with these stack traces.

Most likely this is not caused by Flink but some K8s behavior. The question is 
whether we can do something about it if it occurs often.

  was:
When deploying a session cluster on Kubernetes via 
{{bin/kubernetes-session.sh}}, I see the following stack trace in the master 
logs:

{code}
2020-06-04 01:17:52,068 WARN  
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Unhandled 
exception
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_252]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[?:1.8.0_252]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[?:1.8.0_252]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_252]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) 
~[?:1.8.0_252]
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247)
 ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140)
 ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
 ~[flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 [flink-dist_2.11-1.11.0.jar:1.11.0]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
{code}

I am not entirely sure whether this is a configuration problem or a K8s service 
which does some liveness checks? The consequence is that the JM logs are being 
cluttered with these stack traces.


> Unhandled exception stack trace from DispatcherRestEndpoint when deploying 
> Kubernetes session cluster
> -----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-18129
>                 URL: https://issues.apache.org/jira/browse/FLINK-18129
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.11.0
>            Reporter: Till Rohrmann
>            Priority: Major
>             Fix For: 1.11.0
>
>
> When deploying a session cluster on Kubernetes via 
> {{bin/kubernetes-session.sh}}, I see the following stack trace in the master 
> logs:
> {code}
> 2020-06-04 01:17:52,068 WARN  
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint   [] - Unhandled 
> exception
> java.io.IOException: Connection reset by peer
>       at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_252]
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
> ~[?:1.8.0_252]
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
> ~[?:1.8.0_252]
>       at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_252]
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:377) 
> ~[?:1.8.0_252]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:247)
>  ~[flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1140)
>  ~[flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
>  ~[flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:632)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:549)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  [flink-dist_2.11-1.11.0.jar:1.11.0]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
> {code}
> I am not entirely sure whether this is a configuration problem or a K8s 
> service which does some liveness checks? The consequence is that the JM logs 
> are being cluttered with these stack traces.
> Most likely this is not caused by Flink but some K8s behavior. The question 
> is whether we can do something about it if it occurs often.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to