[
https://issues.apache.org/jira/browse/ARTEMIS-4476?focusedWorklogId=893144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-893144
]
ASF GitHub Bot logged work on ARTEMIS-4476:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 30/Nov/23 11:24
Start Date: 30/Nov/23 11:24
Worklog Time Spent: 10m
Work Description: gtully commented on code in PR #4694:
URL: https://github.com/apache/activemq-artemis/pull/4694#discussion_r1410534274
##########
artemis-protocols/artemis-openwire-protocol/src/main/java/org/apache/activemq/artemis/core/protocol/openwire/OpenWireConnection.java:
##########
@@ -761,7 +761,11 @@ public void fail(ActiveMQException me, String message) {
final ThresholdActor<Command> localVisibleActor = openWireActor;
if (localVisibleActor != null) {
- localVisibleActor.shutdown(() -> doFail(me, message));
+ localVisibleActor.requestShutdown();
+ }
+
+ if (executor != null) {
+ executor.execute(() -> doFail(me, message));
Review Comment:
I don't follow, the point is to terminate processing of commands and execute
the doFail as the last/next task. The only call to fail should be from the
netty socket handler that sees a socket error, remote close etc. It is the
transport initiating a close on a socket error.
Issue Time Tracking
-------------------
Worklog Id: (was: 893144)
Time Spent: 5h 20m (was: 5h 10m)
> Connection Failure Race Conditions in AMQP and Core
> ---------------------------------------------------
>
> Key: ARTEMIS-4476
> URL: https://issues.apache.org/jira/browse/ARTEMIS-4476
> Project: ActiveMQ Artemis
> Issue Type: Task
> Reporter: Clebert Suconic
> Assignee: Clebert Suconic
> Priority: Major
> Time Spent: 5h 20m
> Remaining Estimate: 0h
>
> Failure Detection has a possibility to a race condition with the processing
> of the client packets (or frames in the case of AMQP).
> This is because Netty detects the failure and removes the connection objects
> while the packets are still processing things.
> I was not able to reproduce this particular issue, but I have seen a case
> from a memory dump where the consumer was created while the connection was
> already dropped, leaving the consumer isolated without any communication with
> clients.
> That particular case I could see a possibility because of these races.
> I am adding tests to exercise connection failure in stress and I was able to
> reproduce other issues.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)