My understanding is that some services were affected much more than others. One service was seeing all of its integration tests failing; another was seeing 50% of its calls failing; another was seeing a significant but manageable number of request failures. My impression is that affected services all had relatively (but not absurdly) low request rates.
It appears that higher latency was caused by retries, not overall slower transport performance. On Fri, Mar 12, 2021 at 12:18 AM Oleg Kalnichevski <[email protected]> wrote: > On Thu, 2021-03-11 at 18:00 -0800, Ryan Schmitt wrote: > > On Saturday we rolled out a company-wide upgrade from Apache client > > 4.5.13 > > to 5.0.3, and yesterday we ended up rolling it back due to several > > services > > reporting significant increases in client-side latency and request > > failures > > due to NoHttpResponseException. Can someone suggest a good place to > > start > > looking for the root cause? > > Hi Ryan > > This is quite disappointing. Do I understand it correctly that only > some services were affected and exhibiting the problem while some were > not? > > Were NoHttpResponseException thrown due to overall slower transport > performance or was higher latency caused by request re-execution due to > NoHttpResponseException? > > My guts tell me that the most likely cause of it could be changes in > connection pooling. This is where I would start looking. > > Naturally there is no way of telling anything for sure without being > able to reproduce the problem in a controlled environment. > > Please let me know if there is anything I could do to help. > > Oleg > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
