RaulGracia opened a new issue #2277: How to handle "not enough non-faulty 
Bookies" situation?
URL: https://github.com/apache/bookkeeper/issues/2277
 
 
   **QUESTION**
   
   We use Bookkeeper extensively in our project. While in general Bookkeeper 
provides good write performance, we noticed that under too much load, the 
Bookkeeper client may exhibit failures such as `BKNotEnoughBookiesException: 
Not enough non-faulty bookies available`. 
   
   As I understand, this problem may be caused due to the lack of throttling 
between the Bookkeeper Client (4.8.2) and Server (4.9.2), which may lead the 
client to queue up too many requests, and therefore overload the server. This 
is my conclusion given that the `BKNotEnoughBookiesException` is normally 
preceded by errors like `ERROR o.a.bookkeeper.client.PendingAddOp - Write of 
ledger entry to quorum failed: LXXX EYYY`, given that one of the Bookies has 
been "disconnected" during the high load period (e.g., `INFO  
o.a.b.proto.PerChannelBookieClient - Disconnected from bookie channel` and 
`WARN  o.a.b.c.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies 
: excludeBookies`).
   
   While I can understand that Bookies can be temporarily non-responsive due to 
high load reasons, my question is: _how do we handle this situation?_ 
Apparently, the Bookkeeper Client tags the overloaded Bookies as "faulty" and 
they are left like this, right? Is there a way for the Bookkeeper Client to use 
again the Bookies classified as "faulty"? The reason is that, after inducing 
high load to a 3-Bookie ensemble and seeing this issue, Bookies can be used 
afterwards (they are not permanently crashed). However, the Bookkeeper Client 
is left in this state in which some of the Bookies are tagged as "faulty". 
   
   PS: I understand that "having more Bookies" could be a workaround, but my 
question is specifically on how to deal with the Bookkeeper Client when it 
quarantines a "faulty" Bookie and we want to use that Bookie later on.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to