[GitHub] [hbase] bbeaudreault commented on pull request #4180: HBASE-26807 Unify CallQueueTooBigException special pause with CallDroppedException

GitBox Tue, 29 Mar 2022 07:03:04 -0700


bbeaudreault commented on pull request #4180:
URL: https://github.com/apache/hbase/pull/4180#issuecomment-1081912070



   @ndimiduk great feedback as well, thanks! I agree with you on the relative 
ambiguity here. I'm wondering how we can make incremental progress though. In 
my opinion this PR is a step in the right direction, though still leaves lots 
of area for improvement as you and Andrew call out.
   
   If it were easy to generate a reasonable Retry-After-like return value, I'd 
be glad to give that a try here. There are a couple of problems I see:
   
   1. I'm not sure how we'd calculate that based on current implementations. I 
think maybe based on an EWMA rate of call queue drainage? We don't currently 
track that, and implementing it here seems out of scope for what started as a 
unification of exception handling in an existing feature.
   2. In my experience it's typically easier to handle backoff configuration on 
the client side. This is for two reasons:
     1. There are often many different clients, with different SLAs. These 
usually play out in the timeout, num tries, and backoff policies.
     2. Clients are typically easier to configure on-demand. HBase's 
`onConfigurationChange` hooks can help with this, but it's still relatively 
less agile than client configs imo.
   
   With all of that said, in terms of setting ourselves up for further 
iteration in future jiras, I wonder if the current implementation here is good 
enough as is or could be improved with that in mind. I'm open to suggestions on 
that, but here are my thoughts:
   
   - I don't think the addition of `server_overloaded` in protobuf is a 
problem, since we can change protobufs relatively freely. 
   - RemoteWithExtrasException is IA.Public, so if we add the serverOverloaded 
stuff here now we'd need to handle deprecations if we wanted to remove that in 
the future.
     - That said, I think any future improvement here might want to be 
predicated on first totally refactoring RemoteWithExtrasException, so we'd need 
to handle handle deprecations anyway
   - The bigger sticking point might be HBaseServerException, which now exposes 
an isServerOverloaded() method. IMO we could leave this here even we added 
further details down the road -- for example, there's little harm in having 
both an `isServerOverloaded()` _and_ `getRetryAfterMillis()`. One describes the 
state of the server, and the other describes what to do about it.
   
   Based on those thoughts, I personally think this PR is good as is without 
hampering our ability to improve this more down the line. That's not based on a 
desire to avoid extra work here though, so if you think there are some tactical 
changes we could make within the spirit of this jira please let me know!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] bbeaudreault commented on pull request #4180: HBASE-26807 Unify CallQueueTooBigException special pause with CallDroppedException

Reply via email to