Hello Natan, >From which version of Dropwizard are you migrating to 1.0.0? Dropwizard's Jersey client implementations uses Apache's HTTP client. Before version 4.4 (Dropwizard 0.9) the client checked every connections in the pool on being stalled before re-using it. It was changed in 4.4 and now it's checked only after `timeToLive` period. So, if your TTL on the client side is the same as on the server, there could be situations when the server could sent a RST flag, while a connection is still in the pool on the client. You could try to set the timeout a little bit less than the server and see if this helps. Alternatively, you could try to set `validateAfterInactivity`, but this will help only if the issue happens with inactive connections leased back to the pool (which is probably not your case). Just a shot in the dark.
Artem On Monday, August 8, 2016 at 1:31:19 PM UTC+2, Natan Abolafya wrote: > > We have a (not-so-micro-anymore) services implementation where the > services communicate with each other using Jersey Client. The default > configuration always works just fine with a regular test. > > However, we have some system tests that are run after another, including > some heavy-load tests. Some of the tests now fail with "Connection Reset" > by jersey client. We have been changing the dropwizard configuration to > remedy this problem on every release, as there has been always a > configuration not working, or half implemented on dropwizard IIRC. > > I believe, typically, the problem comes down to having stale connections > in the connection pool, and some tests making one of the services ending up > using these stale connections. I think, at the time, HttpClient > `validateAfterInactivityPeriod` and `retries` configuration were not > supported or were not functioning as it should. So we had ended up using > these configuration between services: > > jerseyClient: > > timeToLive: 15 minutes > > > applicationConnectors: > - type: http > idleTimeout: 15 minutes > > > > This was, strangely, working fine. I think `timeToLive` was also acting as > `keepAlive` at the time, and `keepAlive` was not working as it should IIRC. > (It was a long time ago, so the details may be rather wrong). The idea is > to keep the inter-service connections open for 15 minutes (for perfomance), > and have an understanding between services about when to kill the > connection; so they wouldn't bother validating the connections. This was > working until 1.0.0. > > > With 1.0.0, "Connection reset" errors have come back. It's rather hard to > isolate the problem and make it simple to reproduce, but I assume it's still > the stale connection issue. I'd like to avoid using > `validateAfterInactivityPeriod` and `retries` (which doesn't work out of the > box with Jersey Client by the way. Needs > config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, > RequestEntityProcessing.BUFFERED); ) as I'm afraid it might affect the > performance badly. I have tried to set `keepAlive` also to 15 minutes, but > that didn't help. > > > Any ideas what might have gone wrong? Or am I being too uptight with > `retries` and `validateAfterInactivityPeriod`? I will enable retries > eventually for the sake of safety, but would prefer not to have it as a > primary method to fix this issue. (Also I'm not sure how buffering entities > would affect the performance). > > -- You received this message because you are subscribed to the Google Groups "dropwizard-user" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
