[ 
https://issues.apache.org/jira/browse/AMQ-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095984#comment-16095984
 ] 

Martin Lichtin commented on AMQ-6666:
-------------------------------------

With 5.14.5 I'm no longer seeing exactly above issue. Failover seems to work 
better.
Perhaps it's also due to upgrading the Spring framework from 3.2 to 4.3.
But there are some issues that I still need to look into in more detail, such 
as in-doubt transactions and duplicate messages.


> Failover Transport - send timeout not working
> ---------------------------------------------
>
>                 Key: AMQ-6666
>                 URL: https://issues.apache.org/jira/browse/AMQ-6666
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.14.0
>            Reporter: Martin Lichtin
>            Priority: Critical
>
> Running into a situation with the Failover Transport not respecting the 
> timeout that's been set. The symptom is endless messages of this kind:
> {noformat}
> 2017-04-29 09:48:26,128 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,228 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,329 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,429 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:26,530 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> ...
> 2017-04-29 09:48:33,270 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> 2017-04-29 09:48:33,371 | TRACE | .engine.cfgengine.in]-11 | 
> FailoverTransport                | sport.failover.FailoverTransport 615 | 81 
> - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to 
> reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, 
> type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = 
> XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
> {noformat}
> The code seems to never get out of this loop:
> {noformat}
>     while (transport == null && !disposed && connectionFailure == null
>             && !Thread.currentThread().isInterrupted() && willReconnect()) {
>         LOG.trace("Waiting for transport to reconnect..: {}", command);
>         long end = System.currentTimeMillis();
>         if (command.isMessage() && timeout > 0 && (end - start > timeout)) {
>             timedout = true;
>             LOG.info("Failover timed out after {} ms", (end - start));
>             break;
>         }
>         try {
>             reconnectMutex.wait(100);
>         } catch (InterruptedException e) {
>             Thread.currentThread().interrupt();
>             LOG.debug("Interupted:", e);
>         }
>         transport = connectedTransport.get();
>     }
> {noformat}
> The timeout is set to 5000ms and should have hit a long time ago, but as 
> "command.isMessage()" returns false, it endlessly stays inside the loop.
> The "command.isMessage()" condition should likely be removed.
> Currently running tests with a patched ActiveMQ, and the situation has 
> improved and fail-over seems to have worked (mostly). (Only seeing an issue 
> with a topic consumer that has not reconnected.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to