michaeljmarshall opened a new pull request #14177:
URL: https://github.com/apache/pulsar/pull/14177


   ### Motivation
   
   In Pulsar 2.8, there is currently a bug that can lead to an incorrectly 
cached value in the `childrenCache`. The resulting behavior is that the broker 
serves the stale cache value until it is evicted from the cache.
   
   ### Steps to Reproduce Issue
   
   Start a 2.8 cluster with at least 2 brokers. It is helpful to run with debug 
logging to observe the ZK watch notifications. Run the following bash commands 
in order:
   
   ```
   BROKER_1=192.168.6.228
   BROKER_2=192.168.79.61
   bin/pulsar-admin --admin-url http://$BROKER_1:8080 tenants create test
   bin/pulsar-admin --admin-url http://$BROKER_1:8080 namespaces create test/a
   bin/pulsar-admin --admin-url http://$BROKER_2:8080 topics list test/a
   bin/pulsar-admin --admin-url http://$BROKER_1:8080 topics create 
persistent://test/a/a
   bin/pulsar-admin --admin-url http://$BROKER_2:8080 topics list test/a
   ```
   
   When broker 2 handles the command for `bin/pulsar-admin --admin-url 
http://$BROKER_2:8080 topics list test/a`, it caches a miss in the 
`childrenCache` in `AbstractMetadataStore` for path 
`/managed-ledgers/test/a/persistent`.
   
   After caching the miss, broker 2 only logs two ZK events:
   
   > 05:21:16.810 [main-EventThread] DEBUG 
org.apache.pulsar.metadata.impl.ZKMetadataStore - Received ZK watch : 
WatchedEvent state:SyncConnected type:NodeCreated 
path:/admin/local-policies/test/a
   
   > 05:21:19.808 [main-EventThread] DEBUG 
org.apache.pulsar.metadata.impl.ZKMetadataStore - Received ZK watch : 
WatchedEvent state:SyncConnected type:NodeCreated 
path:/managed-ledgers/test/a/persistent
   
   Note that the second even is of type `NodeCreated`. Because of its type, the 
`AbstractMetadataStore` does not invalidate the correct node in the 
`childrenCache`: 
   
   
https://github.com/apache/pulsar/blob/42422d84ab5c6d24b57138c39453b45d7dcfba35/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/AbstractMetadataStore.java#L163-L182
   
   I was not able to reproduce this issue in 2.9. My theory is that we get 
around it because we have a persistent watch at `/`.
   
   Note also that when creating a second topic in the namespace, we see the 
following notification:
   
   > 07:14:42.181 [main-EventThread] DEBUG 
org.apache.pulsar.metadata.impl.ZKMetadataStore - Received ZK watch : 
WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/managed-ledgers/test/a/persistent
   
   In this case, we properly invalidate the child node in the cache.
   
   Also note that in 2.7 we _always_ invalidate the child node for a 
notification. I don't believe this is strictly necessary because we'll get 
`NodeChildrenChanged` notifications when the event is not created/deleted.
   
   
https://github.com/apache/pulsar/blob/77f7965673119ff40c929b065ee837fe2256a221/pulsar-zookeeper-utils/src/main/java/org/apache/pulsar/zookeeper/ZooKeeperCache.java#L146
   
   ### Modifications
   
   * Invalidate the `path` for `childrenCache` when the `path` is created or 
deleted.
   
   ### Verifying this change
   
   I added a test that failed before this change and passes after the change.
   
   ### Does this pull request potentially affect one of the following parts:
   
   This is an internal change.
   
   ### Documentation
   - [x] `no-need-doc` 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to