Hi, having the following setup: HttpClient 3.0.1 --> HWLB --> Apache Synapse 1.2 --> AS hosting web services
we encounter a couple of java.net.SocketException: Connection reset while performing a graceful restart of Apache Synapse 1.2 The HttpClient per default retries some of those and others not. We have not changed from the default implementation of the retry handler. So according to my current understanding only non-completed requests will be retried (which is generally a very good idea as most of the involved backend operations are not idempotent and cannot be retried safely. So before trying to modify the default behaviour in providing a custom implementation of a retry handler, we first would like to understand the exact cause of the connection reset in this particular case. Generally the reason should be an "unexpected" server-side connection close. What happens during a graceful restart of Synapse is from a highlevel perspective the following: 1. Graceful shutdown request is received 2. Request pause of any listener, sender or task threads 3. Acknowledgement of pause execution 4. Periodically check current processing of already accepted request (before completion of listener pause) and wait until all threads are idle and there are no active connections 5. end of graceful period 6. restart of instance We encounter the connection resets in the phase between 3 and 5. First we tried to understand whether only persistent connections are involved or not. So we disabled keepalive on the httpclient part. This did not change the situation. We send disabled keepalive also on the esb side between esb and service. We still received those exceptions on the server side. We are not quite sure how the hwlb affects the problem. It generally notices the listener pause and directs traffic to the other Apache Synapse nodes. In this environment we are not able to capture traffic via tcpmon, but we could activate wire logs on both http client as well as synapse end. Can anyone please point us into the right direction in analyzing and understanding the cause of the connection reset? Once this part is done we need to decide how to properly handle this. We would like to loose no request and do not mistakenly retry a request (which might have been already processed). Any help on this is greatly appreciated. Thanks a lot, Eric
