We have a (not-so-micro-anymore) services implementation where the services communicate with each other using Jersey Client. The default configuration always works just fine with a regular test.
However, we have some system tests that are run after another, including some heavy-load tests. Some of the tests now fail with "Connection Reset" by jersey client. We have been changing the dropwizard configuration to remedy this problem on every release, as there has been always a configuration not working, or half implemented on dropwizard IIRC. I believe, typically, the problem comes down to having stale connections in the connection pool, and some tests making one of the services ending up using these stale connections. I think, at the time, HttpClient `validateAfterInactivityPeriod` and `retries` configuration were not supported or were not functioning as it should. So we had ended up using these configuration between services: jerseyClient: timeToLive: 15 minutes applicationConnectors: - type: http idleTimeout: 15 minutes This was, strangely, working fine. I think `timeToLive` was also acting as `keepAlive` at the time, and `keepAlive` was not working as it should IIRC. (It was a long time ago, so the details may be rather wrong). The idea is to keep the inter-service connections open for 15 minutes (for perfomance), and have an understanding between services about when to kill the connection; so they wouldn't bother validating the connections. This was working until 1.0.0. With 1.0.0, "Connection reset" errors have come back. It's rather hard to isolate the problem and make it simple to reproduce, but I assume it's still the stale connection issue. I'd like to avoid using `validateAfterInactivityPeriod` and `retries` (which doesn't work out of the box with Jersey Client by the way. Needs config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, RequestEntityProcessing.BUFFERED); ) as I'm afraid it might affect the performance badly. I have tried to set `keepAlive` also to 15 minutes, but that didn't help. Any ideas what might have gone wrong? Or am I being too uptight with `retries` and `validateAfterInactivityPeriod`? I will enable retries eventually for the sake of safety, but would prefer not to have it as a primary method to fix this issue. (Also I'm not sure how buffering entities would affect the performance). -- You received this message because you are subscribed to the Google Groups "dropwizard-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
