Bridge connection lost, client side tries to reconnect for a while then stops 
trying (HTTPS through a proxy server)
-------------------------------------------------------------------------------------------------------------------

                 Key: AMQ-3522
                 URL: https://issues.apache.org/jira/browse/AMQ-3522
             Project: ActiveMQ
          Issue Type: Bug
    Affects Versions: 5.4.2
         Environment: We have a hub-and-spoke bridge architecture where the hub 
is the client that connects (over HTTPS) to seventy some servers. A proxy 
server resides at each server node thus the connection is https over a proxy. 
The client side of the bridge is running on Linux (CentOS), the server side of 
bridge is running on Windows 2003. The proxy server is Apache HTTP server. All 
of this runs over a VPN.

The specific amq build we are using is from FUSE, namely, 5.4.2-fuse-03-09.  On 
the client side, we have amq logging set as follows:

log4j.rootLogger=INFO, console, logfile
log4j.logger.org.apache.activemq.spring=WARN
log4j.logger.org.apache.activemq.web.handler=WARN
log4j.logger.org.springframework=WARN
log4j.logger.org.apache.xbean=WARN
log4j.logger.org.apache.camel=INFO
log4j.logger.org.apache.activemq.network.DiscoveryNetworkConnector=DEBUG
            Reporter: Gregg Saffell
         Attachments: client-activemq.log, server-access.log, server-wrapper.log

Each server node has a queue that is dedicated to messages going from the 
server to the client. That queue is therefore expected always to have one 
consumer (the one that does the bridge forwarding). At least two times per 
week, we detect that one of the servers shows zero consumers on that queue (in 
which case messages are not being forwarded across the bridge) and this 
persists until we stop and restart the client amq instance. Examination of the 
logs shows that the client side detected the connection had been lost and it 
retried to establish it for a time but repeatedly failed and then seemed to 
give up. The wrapper.log file on the server side shows "IDLE 
SCEP@...[d=false,io=1,w=true,rb=false,wb=false]" for the period where the 
client is not able to re-establish the connection.

The client-side network connector definition is as follows: 

<networkConnector name="ECENTRE_TSOLAR_EU" 
uri="static:(https://localhost:61617?   
readCheckTime=20000&amp;initialDelayTime=4000&amp;keepAliveResponseRequired=true&amp;proxyHost=ecentre.tsolar.eu&amp;proxyPort=80)?useExponentialBackOff=false&amp;initialReconnectDelay=5000&amp;maxReconnectDelay=5000"
 duplex="true">
    <dynamicallyIncludedDestinations>
        <queue physicalName="TO_EC.ECENTRE_TSOLAR_EU.&gt;"/>
        <queue physicalName="TO_SCA.&gt;"/>
    </dynamicallyIncludedDestinations>
        <staticallyIncludedDestinations>
                        <queue 
physicalName="TO_EC.ECENTRE_TSOLAR_EU.ASYNC_MSGS"/>
                        <queue 
physicalName="TO_EC.ECENTRE_TSOLAR_EU.SYNC_RESP"/>
                        <queue physicalName="TO_SCA.ASYNC_MSGS"/>
                        <queue physicalName="TO_SCA.ADMIN.SYNC_REQ"/>
                        <queue physicalName="TO_SCA.SSO.SYNC_REQ"/>
        </staticallyIncludedDestinations>
</networkConnector>


The server-side transport connector definition is as follows:

        <transportConnectors>
            <transportConnector name="https" 
uri="https://127.0.0.1:61617?transport.readCheckTime=20000&amp;transport.initialDelayTime=20000&amp;transport.keepAliveResponseRequired=true&amp;needClientAuth=true"/>
        </transportConnectors>

The proxy server resides on the same machine and is configured with the 
following virtual host:

<VirtualHost _default_:80>
      ErrorLog logs/default80.ecentre_error.log

      ProxyRequests on
      AllowCONNECT 61617
      ProxyVia on
      LogLevel warn

      RewriteEngine on
      RewriteCond %{REQUEST_METHOD} (GET|POST|HEAD|PUT|DELETE)
      RewriteCond %{REQUEST_URI} !^/eCentreToolServices*
      RewriteCond %{REQUEST_URI} !^/eCentreServices*
      RewriteRule ^(.*)$ https://%{SERVER_NAME}$1 [L,R,NC]
      
      #Take out the TRACE and TRACK
      RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
      RewriteRule .* -[F]

</VirtualHost>

I'm about to attach amq logs from both the client and server along with the 
apache access log.  The specific remote server that had the problem is 
ecentre.tsolar.eu. Allow me to direct your attention to some specific entries: 

-------------------
client-activemq.log
-------------------
2011-09-30 07:46:33,357 | WARN -- connection to tsolar shutdown due to a remote 
error: java.io.IOException: Could not post command: KeepAliveInfo {} due to: 
java.net.SocketException: Connection reset

(several attempts to re-establish the connection to tsolar)

2011-09-30 07:48:29,566 | INFO -- the last attempt is made to re-establish the 
connection to tsolar

2011-09-30 14:06:34,816 | INFO -- we restart amq 

2011-09-30 14:07:10,646 | INFO -- connection established to tsolar

-------------------
server-wrapper.log
-------------------
(Note: the server node's clock is around 5 hours 50 minutes ahead of the 
client's)
The tsolar server 

INFO   | jvm 1    | 2011/09/30 13:38:06 |  WARN | Stopping an existing active 
duplex connection [Transport Connection to: blockingQueue_8847135] for network 
connector 
(ECENTRE_TSOLAR_EU@ID:amq-us01.ilstechnology.net-44428-1317330422396-0:1).

(connection is stopped and re-established several times)

INFO   | jvm 1    | 2011/09/30 13:39:57 |  INFO - Network connection shutdown 
due to inactivity timeout

(note the series of "IDLE SCEP@... [d=false,io=1,w=true,rb=false,wb=false]" 
entries here, not sure if they are relevant, they don't appear in activemq.log, 
only in wrapper.log)

INFO   | jvm 1    | 2011/09/30 19:58:38 |  INFO - connection re-established


------------------------
server apache access log
------------------------
216.244.116.13 - - [30/Sep/2011:13:38:05 +0200] -- begins a series of rapid 
"CONNECT localhost:61617 HTTP/1.1" 200 entries

216.244.116.13 - - [30/Sep/2011:13:40:04 +0200] -- last in series of rapid 
"CONNECT localhost:61617 HTTP/1.1" 200 entries

216.244.116.13 - - [30/Sep/2011:19:58:34 +0200] -- the next "CONNECT 
localhost:61617 HTTP/1.1" 200 entry


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to