[
https://issues.apache.org/jira/browse/CASSANDRA-15958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185251#comment-17185251
]
Adam Holmberg edited comment on CASSANDRA-15958 at 8/26/20, 3:03 PM:
---------------------------------------------------------------------
Looked at the {{OutboundMessageQueue}} issue a bit more. Put more simply,
there's a race [adding to this queue and updating the expiration
deadline|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L89-L92]
while another thread is draining (
[1|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L256]
[2|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L148]
[3|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L484]
) and also updating
([1|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L288][2|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L488]).
The race is there, but I'm not certain it would be a problem in an operating
server, since nothing is spinning on an inactive queue waiting for messages to
be evacuated, like this test is. In other words, new incoming messages and
ongoing delivery would break this loose naturally.
Two ways to proceed:
1.) We can agree that it's not a problem, and I can make this test not
susceptible to the timeout.
2.) We try to fix by adding synchronization around both the external queue and
expiry update. I would need to expand the analysis quite a bit to understand
what performance implications that might have (since the apparent point of the
two queue design is efficiency).
[~yifanc] I'm interested in your take.
also /cc [~benedict] [~aleksey] for ideas since this is part of your newish
messaging rewrite.
was (Author: aholmber):
Looked at the {{OutboundMessageQueue}} issue a bit more. Put more simply,
there's a race [adding to this queue and updating the expiration
deadline|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L89-L92]
while another thread is draining (
[1|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L256][2|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L148][3|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L484])
and also updating
([1|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L288][2|https://github.com/apache/cassandra/blob/405e2dd8b5610208596ab4cb0bb6b9be7a159f5e/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L488]).
The race is there, but I'm not certain it would be a problem in an operating
server, since nothing is spinning on an inactive queue waiting for messages to
be evacuated, like this test is. In other words, new incoming messages and
ongoing delivery would break this loose naturally.
Two ways to proceed:
1.) We can agree that it's not a problem, and I can make this test not
susceptible to the timeout.
2.) We try to fix by adding synchronization around both the external queue and
expiry update. I would need to expand the analysis quite a bit to understand
what performance implications that might have (since the apparent point of the
two queue design is efficiency).
[~yifanc] I'm interested in your take.
also /cc [~benedict] [~aleksey] for ideas since this is part of your newish
messaging rewrite.
> org.apache.cassandra.net.ConnectionTest testMessagePurging
> ----------------------------------------------------------
>
> Key: CASSANDRA-15958
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15958
> Project: Cassandra
> Issue Type: Bug
> Components: Test/unit
> Reporter: David Capwell
> Assignee: Adam Holmberg
> Priority: Normal
> Fix For: 4.0-beta
>
>
> Build:
> https://ci-cassandra.apache.org/job/Cassandra-trunk-test/196/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/
> Build:
> https://ci-cassandra.apache.org/job/Cassandra-trunk-test/194/testReport/junit/org.apache.cassandra.net/ConnectionTest/testMessagePurging/
> java.util.concurrent.TimeoutException
> at org.apache.cassandra.net.AsyncPromise.get(AsyncPromise.java:258)
> at org.apache.cassandra.net.FutureDelegate.get(FutureDelegate.java:143)
> at
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:268)
> at
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:236)
> at
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:679)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]