benjumanji opened a new issue, #4553:
URL: https://github.com/apache/bookkeeper/issues/4553

   I have the following config (shortened for brevity) on pulsar 4.0.1
   
   ```
   bookkeeperClientRegionawarePolicyEnabled=true
   reppRegionsToWrite=euw1-az3;euw1-az1;euw1-az2
   reppMinimumRegionsForDurability=2
   ```
   
   I have at least three bookies. If I try the aforementioned policy (e3,w3,a2) 
then the exception here: 
https://github.com/apache/bookkeeper/blob/0748423e3228f7cf61d2e1f2ab11e354ed84c0df/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RegionAwareEnsemblePlacementPolicy.java#L317
 is thrown. 
   
   <img width="1210" alt="Screenshot 2025-01-30 at 21 01 17" 
src="https://github.com/user-attachments/assets/001a603c-32ce-4d1f-aba9-fea20dd17032";
 />
   
   This makes little sense to me as `2 <= 3 - 3/2` evaluates to true, but I am 
failing to see _why_ this is a bad configuration.
   
   ```
               // We must survive the failure of numRegions - 
effectiveMinRegionsForDurability. When these
               // regions have failed we would spread the replicas over the 
remaining
               // effectiveMinRegionsForDurability regions; we have to make 
sure that the ack quorum is large
               // enough such that there is a configuration for spreading the 
replicas across
               // effectiveMinRegionsForDurability - 1 regions
   ```
   
   Ok so I have 3 regions, and I want 2 for durability. I therefore can only 
tolerate 1 region failing. If that region fails I have two regions, and I 
require two acks. I have two bookies, they can both ack, what's the problem? 
Why is 4/4/3 good and 3/3/2 bad? If the argument is that the initial placements 
might be 2 in one region and 1 in another, why doesn't this apply to 4/4/3 (3 
in one region and one in another)? If we plug in 3/3/2 to the comment, then we 
need to survive 3 - 2 failures (1), and we need to make sure acks cover 2 - 1 
(1) regions? Why does 3 acks + 4 writers fulfil this and 2 acks and 3 writers 
not? 
   
   I guess what's eating me is I don't want the extra tail latency or to pay 
for the extra disks. I just want 3 replicas, and to survive a region out. There 
doesn't seem to be a configuration possible for this.
   
   Ok, lets take the following (from the 
[docs](https://pulsar.apache.org/docs/4.0.x/administration-isolation-bookie/#region-aware-placement-policy)):
   
   > For example, the BookKeeper cluster has 4 regions, and each region has 
several racks with their bookie instances, as shown in the following diagram. 
If a topic is configured with EnsembleSize=3, WriteQuorum=3, and AckQuorum=2, 
the BookKeeper client chooses three different regions, such as Region A, Region 
C and Region D. For each region, it chooses one bookie on a single rack, such 
as Bookie5 on Rack2, Bookie17 on Rack6, and Bookie21 on Rack8.
   
   The only value for min reegions for durability under which the expression 
evaluates to false for 3/3/2 is 1, which is a data-loss ready config. So either 
the docs are recommending a guaranteed fail, or an impossible configuration 
according the repp validation code.
   
   _Originally posted by @benjumanji in 
https://github.com/apache/pulsar/discussions/23913_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to