michaeljmarshall opened a new issue, #18012:
URL: https://github.com/apache/pulsar/issues/18012

   ### Motivation
   
   The current Bookkeeper configuration defaults to using 
`org.apache.bookkeeper.net.ScriptBasedMapping` for the `DNSToSwitchMapping` 
implementation. However, this default configuration does not align with the 
Broker's default configuration, which is 
`org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`. As such, the default 
configuration for a Pulsar cluster does not lead to ideal rack awareness when 
ledgers need to be recovered. The result is that a user can configure a cluster 
for rack awareness and the brokers will honor that configuration, but the 
autorecovery process will not because it does not have the correct bookkeeper 
cluster topology view.
   
   I propose we configure bookkeeper to use the broker's 
`ZkBookieRackAffinityMapping` class. That way, autorecovery will honor the 
operator's configured rack awareness policies out of the box.
   
   ### Goal
   
   Ensure consistent rack awareness policies.
   
   I propose this is a bug fix that requires patching all active versions of 
Pulsar.
   
   ### API Changes
   
   _No response_
   
   ### Implementation
   
   See https://github.com/apache/pulsar/pull/15640.
   
   Add default value for `reppDnsResolverClass` to the `conf/bookkeeper.conf` 
configuration. This change effectively switches the default from 
`org.apache.bookkeeper.net.ScriptBasedMapping` to 
`org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping`.
   
   ### Alternatives
   
   The tradeoff is that a user relying on the `ScriptBasedMapping` default 
might accidentally get switched to using the `ZkBookieRackAffinityMapping` 
implementation. Given that `ScriptBasedMapping` doesn't work out of the box, 
and that the broker's default to `ZkBookieRackAffinityMapping`, I think this is 
an acceptable tradeoff.
   
   ### Anything else?
   
   I manually verified that the `ZkBookieRackAffinityMapping` works by running 
some tests in a minikube cluster deployed with the DataStax helm chart for 
Apache Pulsar. I set up 3 racks, 4 bookies, and a topic with a E=2, Qw=2, and 
Qa=2. I then verified that the autorecovery pod correctly discovered the racks 
and then identified when an ensemble was not following the rack placement 
policy after two bookies were removed. I documented my testing a bit more here: 
https://github.com/datastax/pulsar-helm-chart/pull/214.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to