[ 
https://issues.apache.org/jira/browse/ARTEMIS-4571?focusedWorklogId=900236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-900236
 ]

ASF GitHub Bot logged work on ARTEMIS-4571:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Jan/24 21:14
            Start Date: 17/Jan/24 21:14
    Worklog Time Spent: 10m 
      Work Description: jbertram opened a new pull request, #4745:
URL: https://github.com/apache/activemq-artemis/pull/4745

   There is a race condition between ConnectionEntry.ttl and 
FailureCheckAndFlushThread whereby an in-vm connection may get closed 
inadvertently due to a TTL timeout. This is because ConnectionEntry.ttl is 
initialized to 60000 and then later set to -1 upon the initial Ping. If this 
update happens at *just* the right time in FailureCheckAndFlushThread then the 
connection will be closed.
   
   The fix ensures that the ConnectionEntry.ttl is set to -1 for in-vm 
connections from the start. It also eliminates the possibility of the race in 
FailureCheckAndFlushThread.
   
   This fix is based on static analysis of the code. The timing window is just 
too small to contruct a reliable test. The failure has only been seen in the 
wild a handful of times.




Issue Time Tracking
-------------------

            Worklog Id:     (was: 900236)
    Remaining Estimate: 0h
            Time Spent: 10m

> Race condition w/TTL impacting in-vm connections
> ------------------------------------------------
>
>                 Key: ARTEMIS-4571
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-4571
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>            Reporter: Justin Bertram
>            Assignee: Justin Bertram
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following WARN can occur due to a race condition between the 
> initialization of 
> {{org.apache.activemq.artemis.spi.core.protocol.ConnectionEntry}} and the 
> periodic check by {{RemotingServiceImpl$FailureCheckAndFlushThread}}:
> {noformat}
> AMQ212037: Connection failure to invm:0 has been detected: AMQ229014: Did not 
> receive data from invm:0 within the -1ms connection TTL. The connection will 
> now be closed. [code=CONNECTION_TIMEDOUT]{noformat}
> Also, the following ERROR message can happen at the same time:
> {noformat}
> ActiveMQNotConnectedException[errorType=NOT_CONNECTED message=AMQ219010: 
> Connection is destroyed]{noformat}
> Internally, the 
> {{org.apache.activemq.artemis.spi.core.protocol.ConnectionEntry}} is subject 
> to the following race condition:
> # {{ConnectionEntry}} is initilalized with the default 
> {{ActiveMQClient.DEFAULT_CONNECTION_TTL}} (60000) at 
> {{CoreProtocolManager#createConnectionEntry()}}
> # {{RemotingServiceImpl$FailureCheckAndFlushThread}} evaluates {{if 
> (entry.ttl != -1)}} as {{true}}.
> # A {{Ping}} is sent. Then {{ttl}} is updated to 
> {{ActiveMQClient.DEFAULT_CONNECTION_TTL_INVM}} (-1).
> # {{RemotingServiceImpl$FailureCheckAndFlushThread}} checks {{if (now >= 
> entry.lastCheck + entry.ttl)}}. Since {{ttl}} has been updated to {{-1}} the 
> check passes (= expired) and the connection will be added to {{toRemove}}.
> # The WARN and ERROR occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to