[ 
https://issues.apache.org/activemq/browse/AMQ-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40405
 ] 

Manuel Teira commented on AMQ-1146:
-----------------------------------

I'm having some issues where slow performance in the backend database seems to 
lead to InactivityIOExceptions. My opinion is that the fix for this issue is 
causing those problems: 

Having in mind that the design of the readCheck/writeCheck monitors is that we 
always should have a command among two readCheck (produced by normal activity 
or by the response of a KeepAlive injected by the writeCheck between two 
readChecks if needed), synchronizing on the readChecker variable seems a bad 
idea:

If the chain of transportListener.onCommand() is lasting more than the 
negociated MaxInactivityDuration, we will end with two readCheck executing one 
after the other, without a writeCheck between them. The first readCheck 
success, sets the commandReceived to false, and the second one is always going 
to fail, as no new command has been processed, nor a writeCheck has been 
executed since the last readCheck.

I don't see the need of synchronizing on the readChecker/writeChecker 
variables. These are AtomicBooleans and should be safe to access them from 
different threads (this seemed to be the motivation of the fix). Also, 
synchronizing the readCheck/writeCheck is actually delaying the execution of 
the check, and we cannot guarantee anymore the alternating of writeCheck 
readCheck execution, that is needed for the solution to work.

Also, synchronizing that is avoiding this code in readCheck() :

        synchronized (readChecker) {
            if (inReceive.get()) {
                LOG.trace("A receive is in progress");
                return;
            }


to be reached: because inReceive is only set during onCommand, and that is 
synchronized on readChecker also. Furthermore, the right way to do this is 
returning here when a onCommand is performed at the same time, and not waiting, 
to not break the sequence of readCheck/writeCheck invocations.


In few words, I think that this change breaks the InactivityMonitor, and my 
opinion is that it should be rolled back. Also in the 4.1 subtask that I opened 
some time ago. :-(

Any comments?



> InactivityMonitor: Transport connection disconnected "Channel was inactive 
> for too long."
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-1146
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1146
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 4.1.0
>         Environment: Windows Server 2003 running broker and Windows XP 
> running client. Java 1.5.0_08 
>            Reporter: Mari
>            Assignee: Hiram Chirino
>            Priority: Minor
>             Fix For: 5.0.0
>
>         Attachments: InactivityMonitor_changed.java, 
> InactivityMonitor_Original.java
>
>   Original Estimate: 1 hour
>  Remaining Estimate: 1 hour
>
> Sometimes (when the transport is idle for long) the transport gets 
> disconnected with the message "Channel was inactive for too long." 
>         Apparently this is same as reported in 
> http://www.nabble.com/Channel-was-inactive-for-too-long-t1463069.html#a3971744
>  and 
> http://www.nabble.com/Durable-consumer-reconnect-problem-tf1716817.html#a6147014
>  
> Version: 4.1 
> Background: 
>         The class org.apache.activemq.transport.InactivityMonitor runs 
> monitoring threads to check the read and write of the transport(TCP for 
> example). If it hasn't sent the message during the given period, it sends a 
> KeepAliveInfo to ensure the connectivity. If it doesn't receive the message 
> for a given duration, then it disconnects the transport. 
>         By default, the maximumInactivityDuration is 30000 milliseconds. This 
> is the time interval between the check for read. That is if an end of the 
> transport doesn't receive any message for this period, then it will close the 
> connection. For the read check, the time interval is half of 
> maximumInactivityDuration i.e. 15000 millisecond. That is if a message was 
> not sent during this period, it will send a KeepAliveInfo to ensure the 
> connectivity. 
> Problem and the recommended fix: 
>         The InactivityMonitor class uses the flags inReceive and 
> commandReceived variables in methods readCheck and onCommand. Typically these 
> two methods are called from different threads. Also, the flags inSend and 
> commandSent are used in methods oneway and writeCheck. Again these two 
> methods are called from different threads. But, the synchronization was 
> missing for these. This seems to be a potential cause for the problem. So, 
> synchronization has been incorporated for these flag usages. 
>         The same class InactivityMonitor is used at both client and server 
> side. In general, the write check (which sends KeepAliveInfo) happens twice 
> during the time when the read check happens once. Probably the client and 
> server machines have performance differences and just to be safer, now the 
> write check is made to happen thrice during the time when the read check 
> happens once. 
>         The original and the changed source files are attached here. 
> http://www.nabble.com/file/5958/InactivityMonitor_changed.java
> http://www.nabble.com/file/5957/InactivityMonitor_Original.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to