briaugenreich opened a new pull request, #4900: URL: https://github.com/apache/hbase/pull/4900
This only affects the Table implementation in 2.x releases. This change to the exception thrown and failure response during an operation timeout for multigets ensures we do not create a feedback loop that is impossible to recover from by clearing the meta cache. We skip over the cache clear and simply set each get as failed. If meta is overloaded, or you send any sufficiently large batch of actions, the resolving of HRegionLocations (which happens sequentially) may take a while. Depending on the operation timeout configured for the client, that duration may already exceed that timeout before even reaching the CancellableRegionServerCallable.call(). When the timeout is exceeded there, a DoNotRetryIOException is thrown. This is considered a cache clearing exception, so any locations that may have been slowly resolved earlier up the chain will be thrown away. If done with enough concurrency, this can create a feedback loop that is impossible to recover from. cc: @bbeaudreault -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
