kwenZh opened a new issue, #15464:
URL: https://github.com/apache/pulsar/issues/15464

   **Describe the bug**
   there are a  pulsar cluster in k8s
   
   We decided to migrate bookie to a new  bookie cluster,  
   1, we set auto recovery is enabled
   2. deploy a new bookie statefulset normally
   3. Reduced number of instances one by one .
   4. i can see recovery is working
   
   
    old bookie cluster has 5 pod (0-4), first i close  bookie-4
   but in latest , there are some ledgerId are not  migrate successful, 
   
   bin/bookkeeper shell listunderreplicated
   ```
   
   02:27:18.605 [main-EventThread] INFO  
org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is 
connected now.
   02:27:18.720 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 90129
   02:27:18.721 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748446769
   02:27:18.722 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 93326
   02:27:18.723 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748452235
   02:27:18.724 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 90190
   02:27:18.724 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748446993
   02:27:18.726 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 93508
   02:27:18.726 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748452461
   02:27:18.728 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 90955
   02:27:18.728 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748448866
   02:27:18.730 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 95981
   02:27:18.730 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748457162
   02:27:18.732 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 92099
   02:27:18.732 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748450587
   02:27:18.734 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 92461
   02:27:18.734 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748451050
   02:27:18.735 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 98373
   02:27:18.736 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748426002
   02:27:18.737 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 - 98613
   02:27:18.737 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
 -    Ctime : 1651748426602
   02:27:18.739 [main-SendThread(pulsar-cluster-zookeeper:2181)] WARN  
org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send 
thread for session 0x2000018322a00a4.
   org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read 
additional data from server sessionid 0x2000018322a00a4, likely server has 
closed socket
           at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) 
~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
           at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
 ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
           at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) 
[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
   02:27:18.854 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - 
EventThread shut down for session: 0x2000018322a00a4
   02:27:18.854 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 
0x2000018322a00a4 closed
   
   ```
   
   auto recovery log:
   ```
   02:37:31.367 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:37:31.369 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:38:47.113 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:38:47.115 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:10.731 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:10.734 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:10.736 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0xb94a56fe, L:/10.244.81.13:45666 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.736 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0xb94a56fe, L:/10.244.81.13:45666 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.737 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x0030dc6a, L:/10.244.81.13:44790 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.737 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x0030dc6a, L:/10.244.81.13:44790 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.747 [bookkeeper-io-3-4] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.747 [bookkeeper-io-3-1] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.747 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0xb94a56fe, L:/10.244.81.13:45666 ! 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.747 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0x0030dc6a, L:/10.244.81.13:44790 ! 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.748 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0xd8237aad, L:/10.244.81.13:45670 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.748 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x71fcf2a7, L:/10.244.81.13:44794 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.748 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0xd8237aad, L:/10.244.81.13:45670 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.748 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x71fcf2a7, L:/10.244.81.13:44794 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.763 [bookkeeper-io-3-2] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.763 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0xd8237aad, L:/10.244.81.13:45670 ! 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.764 [bookkeeper-io-3-3] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.764 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0x71fcf2a7, L:/10.244.81.13:44794 ! 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.765 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x7021b4ff, L:/10.244.81.13:45674 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.765 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x7021b4ff, L:/10.244.81.13:45674 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.773 [bookkeeper-io-3-4] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.773 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0x7021b4ff, L:/10.244.81.13:45674 ! 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.773 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not 
available while reading L90190 E0 from bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
   02:39:10.774 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x0c5a059e, L:/10.244.81.13:44798 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.774 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x0c5a059e, L:/10.244.81.13:44798 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.783 [bookkeeper-io-3-1] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.783 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0x0c5a059e, L:/10.244.81.13:44798 ! 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not 
available while reading L90190 E0 from bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
   02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR 
org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: 
L90190 E0-E0, Sent to 
[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], 
Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First 
unread entry is (-1, rc = null)
   02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR 
org.apache.bookkeeper.replication.ReplicationWorker - Received error: -8 while 
trying to read entry: 0 of ledger: 90190 in ReplicationWorker
   02:39:10.784 [ReplicationWorker] ERROR 
org.apache.bookkeeper.replication.ReplicationWorker - Failed to read faulty 
entries, so giving up replicating ledgerFragment Fragment(LedgerID: 90190, 
FirstEntryID: 0[0], LastKnownEntryID: 0[0], Host: 
[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], 
Closed: true)
   02:39:10.784 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0xacb8bca9, L:/10.244.81.13:45680 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.784 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0xacb8bca9, L:/10.244.81.13:45680 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.784 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x42c42991, L:/10.244.81.13:44804 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.784 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x42c42991, L:/10.244.81.13:44804 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.793 [bookkeeper-io-3-2] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.793 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0xacb8bca9, L:/10.244.81.13:45680 ! 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.799 [bookkeeper-io-3-3] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.799 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0x42c42991, L:/10.244.81.13:44804 ! 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.800 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0xda6ef22c, L:/10.244.81.13:45684 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.800 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0xda6ef22c, L:/10.244.81.13:45684 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.800 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x7eb38ed7, L:/10.244.81.13:44808 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.800 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x7eb38ed7, L:/10.244.81.13:44808 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:10.808 [bookkeeper-io-3-1] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
   02:39:10.808 [bookkeeper-io-3-1] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0x7eb38ed7, L:/10.244.81.13:44808 ! 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:10.850 [bookkeeper-io-3-4] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received 
from bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
   02:39:10.850 [bookkeeper-io-3-4] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0xda6ef22c, L:/10.244.81.13:45684 ! 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:10.850 [ReplicationWorker] ERROR 
org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker failed 
to replicate Ledger : 90190 for 32 number of times, so deferring the ledger 
lock release by 300000 msecs
   02:39:14.886 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:14.888 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:30.519 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:30.522 [main-EventThread] INFO  
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown 
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged 
path:/ledgers/underreplication/locks
   02:39:30.524 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0xa9665a84, L:/10.244.81.13:45976 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
   02:39:30.524 [bookkeeper-io-3-2] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0xa9665a84, L:/10.244.81.13:45976 - 
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:30.524 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 
[id: 0x0628fbc9, L:/10.244.81.13:45100 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
   02:39:30.524 [bookkeeper-io-3-3] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0x0628fbc9, L:/10.244.81.13:45100 - 
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
 authenticated as BookKeeperPrincipal{ANONYMOUS}
   02:39:30.527 [ReplicationWorker] INFO  
org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
bookie is unknown 
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: 
Bookie handle is not available
   02:39:30.527 [ReplicationWorker] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 as 
endpoint resolution failed (probably bookie is down) err 
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: 
Cannot resolve bookieId 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
bookie does not exist or it is not running
   02:39:30.527 [ReplicationWorker] ERROR 
org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to 
bookie: 
null/pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
 current state CONNECTING :
   
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: 
Cannot resolve bookieId 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
bookie does not exist or it is not running
           at 
org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:63)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.PerChannelBookieClient.connect(PerChannelBookieClient.java:531)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.PerChannelBookieClient.connectIfNeededAndDoOp(PerChannelBookieClient.java:657)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:121)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:116)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:509)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:495)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:489)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.PendingReadOp.sendReadTo(PendingReadOp.java:576) 
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:405)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.read(PendingReadOp.java:386)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.PendingReadOp.initiate(PendingReadOp.java:530) 
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.PendingReadOp.safeRun(PendingReadOp.java:540) 
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) 
[org.apache.bookkeeper-bookkeeper-common-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.LedgerHandle.readEntriesInternalAsync(LedgerHandle.java:896)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.LedgerHandle.asyncReadEntriesInternal(LedgerHandle.java:800)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.LedgerHandle.asyncReadEntries(LedgerHandle.java:694)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.replication.ReplicationWorker.tryReadingFaultyEntries(ReplicationWorker.java:326)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:378)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:277)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:238)
 [org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-common-4.1.66.Final.jar:4.1.66.Final]
           at java.lang.Thread.run(Thread.java:829) [?:?]
   Caused by: 
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: 
Bookie handle is not available
           at 
org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:248)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           at 
org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:43)
 ~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
           ... 22 more
   02:39:30.527 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO  
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not 
available while reading L92461 E2060 from bookie: 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
   02:39:30.528 [ReplicationWorker] INFO  
org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
bookie is unknown 
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: 
Bookie handle is not available
   
   ```
   
   i notice the main error log:  
    ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame 
received from bookie: 
    ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry 
failed: L95981 E210-E210, Sent to 
[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181], 
Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First 
unread entry is (-1, rc = null)
   
   
   the ledger id  metadata
   ```
   2:49:10.484 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - 
ledgerID: 95981
   02:49:10.490 [main] INFO  
org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand - 
LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=2, 
ackQuorumSize=2, state=CLOSED, length=67946172, lastEntryId=2037, 
digestType=CRC32C, password=base64:, 
ensembles={0=[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
 pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181, 
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181]}, 
customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, 
pulsar/managed-ledger=base64:Y29sbGVjdC9kZWZhdWx0L3BlcnNpc3RlbnQvaW9hX2xvZy1wYXJ0aXRpb24tMg==,
 application=base64:cHVsc2Fy}}
   
   ```
   
   
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Go to '...'
   2. Click on '....'
   3. Scroll down to '....'
   6. See error
   
   **Expected behavior**
   recovery log is normally
   
   **Screenshots**
   
   
![image](https://user-images.githubusercontent.com/38367518/167060160-e59abcb3-61d7-42b6-a318-c32ad111b22a.png)
   
   
![image](https://user-images.githubusercontent.com/38367518/167060175-e8510807-eab2-4fb6-91d3-4ec0c95059fe.png)
   
   
   
   **Desktop (please complete the following information):**
    - OS: centos7
    - pulsar: 2.8.1
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to