[
https://issues.apache.org/jira/browse/CASSANDRA-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072138#comment-16072138
]
Sylvain Lebresne commented on CASSANDRA-13649:
----------------------------------------------
I'll note that I've seen those as well and I'm not 100% sure this isn't a "bug"
in Netty's epoll implementation in that I don't think we get this if the NIO
transport is used. That is, it looks like both the Epoll and NIO Netty
implementation don't behave the same way with respect to this. I'll also not we
do catch exceptions in the pipeline in {{Message.Dispatcher.exceptionCaught}},
without re-throwing them, and I'm not sure why that wouldn't catch this (in
fact, I believe this does catch the exception with the NIO event loop, and
that's why I'm suggesting the epoll one may not be doing it's job properly).
Haven't investigated much though tbh, so take this with a grain of salt.
> Uncaught exceptions in Netty pipeline
> -------------------------------------
>
> Key: CASSANDRA-13649
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13649
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stefan Podkowinski
> Attachments: test_stdout.txt
>
>
> I've noticed some netty related errors in trunk in [some of the dtest
> results|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/106/#showFailuresLink].
> Just want to make sure that we don't have to change anything related to the
> exception handling in our pipeline and that this isn't a netty issue.
> Actually if this causes flakiness but is otherwise harmless, we should do
> something about it, even if it's just on the dtest side.
> {noformat}
> WARN [epollEventLoopGroup-2-9] 2017-06-28 17:23:49,699 Slf4JLogger.java:151
> - An exceptionCaught() event was fired, and it reached at the tail of the
> pipeline. It usually means the last handler in the pipeline did not handle
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed:
> Connection reset by peer
> at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> And again in another test:
> {noformat}
> WARN [epollEventLoopGroup-2-8] 2017-06-29 02:27:31,300 Slf4JLogger.java:151
> - An exceptionCaught() event was fired, and it reached at the tail of the
> pipeline. It usually means the last handler in the pipeline did not handle
> the exception.
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)() failed:
> Connection reset by peer
> at io.netty.channel.unix.FileDescriptor.readAddress(...)(Unknown
> Source) ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> {noformat}
> This one looks also odd and makes
> upgrade_internal_auth_test.TestAuthUpgrade.upgrade_to_30_test fail:
> {noformat}
> WARN [epollEventLoopGroup-2-9] 2017-06-29 02:41:37,125 Slf4JLogger.java:151
> - An exceptionCaught() event was fired, and it reached at the tail of the
> pipeline. It usually means the last handler in the pipeline did not handle
> the exception.
> io.netty.handler.codec.DecoderException:
> org.apache.cassandra.transport.ProtocolException: Invalid or unsupported
> protocol version: 4
> at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:375)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:342)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:325)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:307)
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
> [netty-all-4.0.44.Final.jar:4.0.44.Final]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: org.apache.cassandra.transport.ProtocolException: Invalid or
> unsupported protocol version: 4
> at org.apache.cassandra.transport.Frame$Decoder.decode(Frame.java:186)
> ~[main/:na]
> at
> io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
> ... 16 common frames omitted
> {noformat}
> /cc [~jasobrown], [~norman]
> Edit:
> The {{io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed}} error also causes tests to fail for 3.0 and 3.11.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]