horizonzy opened a new issue, #21058:
URL: https://github.com/apache/pulsar/issues/21058

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/pulsar/issues) 
and found nothing similar.
   
   
   ### Version
   
   The newest master.
   
   ### Minimal reproduce step
   
   1. Start AutoRecovery using `bin/bookkeeper recovery`
   2. Stop one bookie
   
   ### What did you expect to see?
   
   ### Superficial cause
   
   ```
   2023-08-21T13:10:47,207-0400 [ReplicationWorker] DEBUG 
org.apache.bookkeeper.replication.ReplicationWorker - Going to replicate the 
fragments of the ledger: 36955355
   
   2023-08-21T13:10:48,889-0400 [ReplicationWorker] DEBUG 
org.apache.bookkeeper.replication.ReplicationWorker - Founds fragments 
[Fragment(LedgerID: 36955355, FirstEntryID: 0[0], LastKnownEntryID: 735[735], 
Host: [bookie-host.com:3181], Closed: true)] for replication from ledger: 
36955355
   
   2023-08-21T13:10:48,911-0400 [ReplicationWorker] INFO  
org.apache.bookkeeper.client.LedgerFragmentReplicator - Replicating fragment 
Fragment(LedgerID: 36955355, FirstEntryID: 0[0], LastKnownEntryID: 735[735], 
Host: [bookie-host.com:3181], Closed: true) in 1 sub fragments.
   
   2023-08-21T13:10:49,393-0400 [main-EventThread] INFO  
org.apache.bookkeeper.client.LedgerFragmentReplicator - Updated ZK to point 
ledger fragments from old bookies to new bookies: 
{bookie-host.com:3181=bookie-host1.com:3181}
   
   2023-08-21T13:10:49,401-0400 [ReplicationWorker] INFO  
org.apache.bookkeeper.replication.ReplicationWorker - Ledger replicated 
successfully. ledger id is: 36955355
   
   2023-08-21T13:10:49,401-0400 [ReplicationWorker] DEBUG 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerUnderreplicationManager - 
markLedgerReplicated(ledgerId=36955355)
   
   2023-08-21T13:10:49,421-0400 [ReplicationWorker] DEBUG 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerUnderreplicationManager - 
releaseLedger(ledgerId=36955355)
   
   2023-08-21T13:10:49,430-0400 [ReplicationWorker] DEBUG 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerManager - Unregistered ledger 
metadata listener ReadOnlyLedgerHandle(lid = 36955355, id = 1009584614) on 
ledger 36955355.
   
   2023-08-21T13:10:49,431-0400 [ReplicationWorker] DEBUG 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerUnderreplicationManager - 
releaseLedger(ledgerId=36955355)
   
   2023-08-21T13:10:49,431-0400 [ReplicationWorker] DEBUG 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerUnderreplicationManager - 
getLedgerToRereplicate()
   
   2023-08-21T13:10:49,431-0400 [ReplicationWorker] DEBUG 
org.apache.pulsar.metadata.bookkeeper.PulsarLedgerUnderreplicationManager - 
isLedgerReplicationEnabled()
   
   
   2023-08-21T13:17:15,122-0400 [ReplicationWorker] DEBUG 
org.apache.bookkeeper.replication.ReplicationWorker - Going to replicate the 
fragments of the ledger: 23993683
   
   2023-08-21T13:17:15,167-0400 [ReplicationWorker] INFO  
org.apache.bookkeeper.replication.ReplicationWorker - Ledger replicated 
successfully. ledger id is: 23993683
   
   ```
   
   We can see the ledger `36955355` was completed quickly, it only took about 2 
seconds (13:10:47-13:10:49).
   
   After `36955355` completing, the ReplicationWorker will get the next 
underReplicated ledger id from the metadata store. From the logs, it takes 
about 6 minutes; this means the method 
`PulsarLedgerUnderreplicationManager#getLedgerToRereplicate` takes about 6 
minutes.
   
   in `PulsarLedgerUnderreplicationManager#getLedgerToRereplicate`, it will 
iterate the child nodes under `/ledgers/underreplication/ledgers`. And it only 
picks the node whose depth is 5, like 
`/ledgers/underreplication/ledgers/0000/0000/0000/4441/urL0000017473`. If the 
path only has 4 depths, like 
`/ledgers/underreplication/ledgers/0000/0000/0000/4442`, the path will be 
ignored, then pick the next path. 
   
   Example:
   There are 4 nodes.
   ```
   /ledgers/underreplication/ledgers/0000/0000/0000/4441/urL0000017473
   /ledgers/underreplication/ledgers/0000/0000/0000/45c8/urL0000017864
   /ledgers/underreplication/ledgers/0000/0000/0000/4785
   /ledgers/underreplication/ledgers/0000/0000/0000/4778
   ```
   
   When picking `/ledgers/underreplication/ledgers/0000/0000/0000/4785`(The 
pick algorithm is random), it found that it only has 4 depths 
('/0000/0000/0000/4785'), the node will be ignored. Then pick the next node 
`/ledgers/underreplication/ledgers/0000/0000/0000/4441`, and it found that it 
has 5 depths(`/0000/0000/0000/4441/urL0000017473`), then return the ledger id 
`17473`.
   
   We found that in the customer's cluster, it has too many invalid nodes with 
a depth of 4. So the replicationWorker iterates too many invalid nodes, and the 
iteration takes a long time.
   
   
   ### Root cause
   > Why are there so many invalid nodes in the client's cluster?
   
   Finally, we found `PulsarLedgerUnderreplicationManager` introduces the 
problem, after `PulsarLedgerUnderreplicationManager` completes the ledger 
replication, it will mark the ledger as replicated. The mark operation will 
delete the node 
`/ledgers/underreplication/ledgers/0000/0000/0000/4441/urL0000017473`, then 
leave the node `/ledgers/underreplication/ledgers/0000/0000/0000/4441/`, so 
went an invalid node.
   
https://github.com/apache/pulsar/blob/30073dbac0e941869b43e090d2682935e8f094e5/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/bookkeeper/PulsarLedgerUnderreplicationManager.java#L393-L422
   
   
   In `ZkLedgerUnderreplicationManager`, after delete the node 
`/ledgers/underreplication/ledgers/0000/0000/0000/4441/urL0000017473`, it 
checks if the child nodes are empty, and if they are, it deletes the parent 
node, so the node `/ledgers/underreplication/ledgers/0000/0000/0000/4441` will 
also be removed, and leave the node 
`/ledgers/underreplication/ledgers/0000/0000/0000`.
   
https://github.com/apache/bookkeeper/blob/5b5c05331757e7356579076970e61f119f5d34ae/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/ZkLedgerUnderreplicationManager.java#L391-L436
   
   
   _**PulsarLedgerUnderreplicationManager is at pulsar side, when the user 
config 
-`Dbookkeeper.metadata.client.drivers=org.apache.pulsar.metadata.bookkeeper.PulsarMetadataClientDriver`,
 the ReplicationWorker will use it.
   ZkLedgerUnderreplicationManager is at the bk side; it's the default.**_
   
   
   
   
   
   
   
   
   
   
   
   
   
   ### What did you see instead?
   
   PulsarLedgerUnderreplicationManager should also delete the empty parent 
nodes.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to