RaulGracia opened a new issue #2277: How to handle "not enough non-faulty Bookies" situation? URL: https://github.com/apache/bookkeeper/issues/2277 **QUESTION** We use Bookkeeper extensively in our project. While in general Bookkeeper provides good write performance, we noticed that under too much load, the Bookkeeper client may exhibit failures such as `BKNotEnoughBookiesException: Not enough non-faulty bookies available`. As I understand, this problem may be caused due to the lack of throttling between the Bookkeeper Client (4.8.2) and Server (4.9.2), which may lead the client to queue up too many requests, and therefore overload the server. This is my conclusion given that the `BKNotEnoughBookiesException` is normally preceded by errors like `ERROR o.a.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: LXXX EYYY`, given that one of the Bookies has been "disconnected" during the high load period (e.g., `INFO o.a.b.proto.PerChannelBookieClient - Disconnected from bookie channel` and `WARN o.a.b.c.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies`). While I can understand that Bookies can be temporarily non-responsive due to high load reasons, my question is: _how do we handle this situation?_ Apparently, the Bookkeeper Client tags the overloaded Bookies as "faulty" and they are left like this, right? Is there a way for the Bookkeeper Client to use again the Bookies classified as "faulty"? The reason is that, after inducing high load to a 3-Bookie ensemble and seeing this issue, Bookies can be used afterwards (they are not permanently crashed). However, the Bookkeeper Client is left in this state in which some of the Bookies are tagged as "faulty". PS: I understand that "having more Bookies" could be a workaround, but my question is specifically on how to deal with the Bookkeeper Client when it quarantines a "faulty" Bookie and we want to use that Bookie later on.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
