lgxbslgx commented on issue #15776:
URL: https://github.com/apache/pulsar/issues/15776#issuecomment-1139665002

   I can reproduce this bug locally by using the following steps:
   
   1. Deploy a cluster according to the 
[document](https://pulsar.apache.org/docs/next/deploy-bare-metal). The cluster 
has 3 zookeeper nodes, 3 bookkeeper nodes and 3 brokers, which is same as the 
document.
   2. Use the command `bin/bookkeeper shell simpletest --ensemble 3 
--writeQuorum 3 --ackQuorum 3 --numEntries 3` to test the bookkeeper. **Note: 
this step is important.**
   3. Produce and comsume sevaral times.
   4. Delete journalDirectories and ledgerDirectories directory of one bookie, 
named `BK1`. (Same as @yebai1105 's step1)
   5. Shutdown the bookie `BK1`. 
   6. Use command `bin/bookkeeper shell listunderreplicated` at bookie node 
`BK1`. (Same as @yebai1105 's step2, but @yebai1105 didn't indicate which 
bookie node to use this command.)
   7. Use command `bin/bookkeeper shell decommissionbookie` at bookie node 
`BK1`. (Same as @yebai1105 's step3, but @yebai1105 didn't indicate which 
bookie node to use this command.)
   
   Then the same error message occurs. It is because the command 
`bin/bookkeeper shell simpletest --ensemble 3 --writeQuorum 3 --ackQuorum 3 
--numEntries 3` create a ledger whose ensemble size is equal to write quorum 
size and is equal to the number of all the bookie(also 3). So this ledger can't 
be replicated util another new bookie node is created.
   
   Now I need to confirm from @yebai1105: have you used the similar command, 
like `bin/bookkeeper shell simpletest --ensemble 4 --writeQuorum 4 --ackQuorum 
4 --numEntries 4`, to test when you deployed your cluster?
   
   If your don't remember whether you had done this test when you deployed your 
cluster, you can use the following command to get the nodes of the ledger (such 
as `396606` your log shows) which is under replicated. 
   
   > 2022-05-25 16:40:17.0035 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 396606
   > 2022-05-25 16:40:17.0035 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 396606
   > 2022-05-25 16:40:17.0035 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -        Ctime : 1651199961381
   > 2022-05-25 16:40:17.0036 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 112963
   2022-05-25 16:40:17.0036 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -        Ctime : 1650363984734
   
   ```shell
   // open the zookeeper shell
   $ bin/pulsar zookeeper-shell -timeout 5000 -server 
<zk-ip/zk-domain>:<zk-port>
   
   // get the ledger 396606 which is under replicated
   $ get /ledger/00/0039/6606
   
   // another example 112963
   $ get /ledger/00/0011/2963
   ```
   
   You can count the bookie node number of such ledger. If the node number is 4 
in your cluster, it means my assumption is right.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to