[
https://issues.apache.org/jira/browse/CASSANDRA-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17352030#comment-17352030
]
Brandon Williams edited comment on CASSANDRA-16677 at 5/26/21, 7:23 PM:
------------------------------------------------------------------------
I'm giving up on this for now, as it can only damage my calm further to
continue.
What I've found is that sometimes you will see a stack that seems to indicate
the channel was already closed:
{noformat}
[junit-timeout] WARN
[Messaging-OUT-/127.0.0.1:7012->/127.0.0.1:7012-LARGE_MESSAGES] 2021-05-26
16:38:31,164 OutboundConnection.java:492 -
/127.0.0.1:7012->/127.0.0.1:7012-LARGE_MESSAGES-[no-channel] dropping message
of type _TEST_1 due to error
[junit-timeout] org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException:
This output stream is in an unsafe state after an asynchronous flush failed
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:209)
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:166)
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:238)
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:259)
[junit-timeout] at
org.apache.cassandra.net.AsyncMessageOutputPlus.close(AsyncMessageOutputPlus.java:117)
[junit-timeout] at
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.doRun(OutboundConnection.java:1002)
[junit-timeout] at
org.apache.cassandra.net.OutboundConnection$Delivery.run(OutboundConnection.java:691)
[junit-timeout] at
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.run(OutboundConnection.java:960)
[junit-timeout] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit-timeout] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit-timeout] at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[junit-timeout] at java.lang.Thread.run(Thread.java:748)
{noformat}
if you add a test for the channel being open before this, it passes. Other
times the test will fail in what appears to be the same manner, but there will
be no stacktrace at all. Most of the failures occur with LARGE_MESSAGES, but
given enough attempts there is occasionally one in SMALL_MESSAGES.
Then there is the unique stack that Berenguer linked, which I have yet to
encounter myself. It is odd too, in that according to it both small and large
messages are expecting 30 bytes and only getting 29, but all my debugging shows
29 bytes working just fine for large messages, and 11 for small (where the
stack wants 14, receiving 13.)
was (Author: brandon.williams):
I'm giving up on this for now, as it can only damage my calm further to
continue.
What I've found is that sometimes you will see a stack that seems to indicate
the channel was already closed:
{noformat}
[junit-timeout] WARN
[Messaging-OUT-/127.0.0.1:7012->/127.0.0.1:7012-LARGE_MESSAGES] 2021-05-26
16:38:31,164 OutboundConnection.java:492 -
/127.0.0.1:7012->/127.0.0.1:7012-LARGE_MESSAGES-[no-channel] dropping message
of type _TEST_1 due to error
[junit-timeout] org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException:
This output stream is in an unsafe state after an asynchronous flush failed
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:209)
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:166)
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:238)
[junit-timeout] at
org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:259)
[junit-timeout] at
org.apache.cassandra.net.AsyncMessageOutputPlus.close(AsyncMessageOutputPlus.java:117)
[junit-timeout] at
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.doRun(OutboundConnection.java:1002)
[junit-timeout] at
org.apache.cassandra.net.OutboundConnection$Delivery.run(OutboundConnection.java:691)
[junit-timeout] at
org.apache.cassandra.net.OutboundConnection$LargeMessageDelivery.run(OutboundConnection.java:960)
[junit-timeout] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit-timeout] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit-timeout] at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[junit-timeout] at java.lang.Thread.run(Thread.java:748)
{noformat}
if you add a test for the channel being open before this, it passes. Other
times the test will fail in what appears to be the same manner, but there will
be no stacktrace at all. Most of the failures occur with LARGE_MESSAGES, but
given enough attemps there is occasionally one in SMALL_MESSAGES.
Then there is the unique stack that Berenguer linked, which I have yet to
encounter myself. It is odd too, in that according to it both small and large
messages are expecting 30 bytes and only getting 29, but all my debugging shows
29 bytes working just fine for large messages, and 11 for small (where the
stack wants 14, receiving 13.)
> Fix flaky ConnectionTest.testMessageDeliveryOnReconnect
> -------------------------------------------------------
>
> Key: CASSANDRA-16677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16677
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Internode
> Reporter: Brandon Williams
> Priority: Normal
>
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/785/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessageDeliveryOnReconnect_compression/
> https://app.circleci.com/pipelines/github/adelapena/cassandra/460/workflows/cf7dcec6-612c-45d1-8471-623bde481dca/jobs/4069
> https://app.circleci.com/pipelines/github/adelapena/cassandra/460/workflows/b750cd38-0263-4b5e-9bb8-a8be98214378/jobs/4065
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]