kwenZh opened a new issue, #15464:
URL: https://github.com/apache/pulsar/issues/15464
**Describe the bug**
there are a pulsar cluster in k8s
We decided to migrate bookie to a new bookie cluster,
1, we set auto recovery is enabled
2. deploy a new bookie statefulset normally
3. Reduced number of instances one by one .
4. i can see recovery is working
old bookie cluster has 5 pod (0-4), first i close bookie-4
but in latest , there are some ledgerId are not migrate successful,
bin/bookkeeper shell listunderreplicated
```
02:27:18.605 [main-EventThread] INFO
org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is
connected now.
02:27:18.720 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 90129
02:27:18.721 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748446769
02:27:18.722 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 93326
02:27:18.723 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748452235
02:27:18.724 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 90190
02:27:18.724 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748446993
02:27:18.726 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 93508
02:27:18.726 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748452461
02:27:18.728 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 90955
02:27:18.728 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748448866
02:27:18.730 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 95981
02:27:18.730 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748457162
02:27:18.732 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 92099
02:27:18.732 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748450587
02:27:18.734 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 92461
02:27:18.734 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748451050
02:27:18.735 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 98373
02:27:18.736 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748426002
02:27:18.737 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- 98613
02:27:18.737 [main] INFO
org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand
- Ctime : 1651748426602
02:27:18.739 [main-SendThread(pulsar-cluster-zookeeper:2181)] WARN
org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send
thread for session 0x2000018322a00a4.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read
additional data from server sessionid 0x2000018322a00a4, likely server has
closed socket
at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77)
~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290)
[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
02:27:18.854 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn -
EventThread shut down for session: 0x2000018322a00a4
02:27:18.854 [main] INFO org.apache.zookeeper.ZooKeeper - Session:
0x2000018322a00a4 closed
```
auto recovery log:
```
02:37:31.367 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:37:31.369 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:38:47.113 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:38:47.115 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:10.731 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:10.734 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:10.736 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0xb94a56fe, L:/10.244.81.13:45666 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.736 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0xb94a56fe, L:/10.244.81.13:45666 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.737 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x0030dc6a, L:/10.244.81.13:44790 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.737 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x0030dc6a, L:/10.244.81.13:44790 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.747 [bookkeeper-io-3-4] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.747 [bookkeeper-io-3-1] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.747 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0xb94a56fe, L:/10.244.81.13:45666 !
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.747 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0x0030dc6a, L:/10.244.81.13:44790 !
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.748 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0xd8237aad, L:/10.244.81.13:45670 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.748 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x71fcf2a7, L:/10.244.81.13:44794 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.748 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0xd8237aad, L:/10.244.81.13:45670 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.748 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x71fcf2a7, L:/10.244.81.13:44794 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.763 [bookkeeper-io-3-2] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.763 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0xd8237aad, L:/10.244.81.13:45670 !
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.764 [bookkeeper-io-3-3] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.764 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0x71fcf2a7, L:/10.244.81.13:44794 !
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.765 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x7021b4ff, L:/10.244.81.13:45674 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.765 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x7021b4ff, L:/10.244.81.13:45674 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.773 [bookkeeper-io-3-4] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.773 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0x7021b4ff, L:/10.244.81.13:45674 !
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.773 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L90190 E0 from bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
02:39:10.774 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x0c5a059e, L:/10.244.81.13:44798 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.774 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x0c5a059e, L:/10.244.81.13:44798 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.783 [bookkeeper-io-3-1] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.783 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0x0c5a059e, L:/10.244.81.13:44798 !
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L90190 E0 from bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed:
L90190 E0-E0, Sent to
[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181],
Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First
unread entry is (-1, rc = null)
02:39:10.784 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.replication.ReplicationWorker - Received error: -8 while
trying to read entry: 0 of ledger: 90190 in ReplicationWorker
02:39:10.784 [ReplicationWorker] ERROR
org.apache.bookkeeper.replication.ReplicationWorker - Failed to read faulty
entries, so giving up replicating ledgerFragment Fragment(LedgerID: 90190,
FirstEntryID: 0[0], LastKnownEntryID: 0[0], Host:
[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181],
Closed: true)
02:39:10.784 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0xacb8bca9, L:/10.244.81.13:45680 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.784 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0xacb8bca9, L:/10.244.81.13:45680 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.784 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x42c42991, L:/10.244.81.13:44804 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.784 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x42c42991, L:/10.244.81.13:44804 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.793 [bookkeeper-io-3-2] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.793 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0xacb8bca9, L:/10.244.81.13:45680 !
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.799 [bookkeeper-io-3-3] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.799 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0x42c42991, L:/10.244.81.13:44804 !
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.800 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0xda6ef22c, L:/10.244.81.13:45684 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.800 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0xda6ef22c, L:/10.244.81.13:45684 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.800 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x7eb38ed7, L:/10.244.81.13:44808 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.800 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x7eb38ed7, L:/10.244.81.13:44808 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:10.808 [bookkeeper-io-3-1] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181
02:39:10.808 [bookkeeper-io-3-1] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0x7eb38ed7, L:/10.244.81.13:44808 !
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:10.850 [bookkeeper-io-3-4] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame received
from bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181
02:39:10.850 [bookkeeper-io-3-4] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0xda6ef22c, L:/10.244.81.13:45684 !
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:10.850 [ReplicationWorker] ERROR
org.apache.bookkeeper.replication.ReplicationWorker - ReplicationWorker failed
to replicate Ledger : 90190 for 32 number of times, so deferring the ledger
lock release by 300000 msecs
02:39:14.886 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:14.888 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:30.519 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:30.522 [main-EventThread] INFO
org.apache.bookkeeper.meta.ZkLedgerUnderreplicationManager - Latch countdown
due to ZK event: WatchedEvent state:SyncConnected type:NodeChildrenChanged
path:/ledgers/underreplication/locks
02:39:30.524 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0xa9665a84, L:/10.244.81.13:45976 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
02:39:30.524 [bookkeeper-io-3-2] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0xa9665a84, L:/10.244.81.13:45976 -
R:pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.74.151:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:30.524 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie:
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
[id: 0x0628fbc9, L:/10.244.81.13:45100 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
02:39:30.524 [bookkeeper-io-3-3] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0x0628fbc9, L:/10.244.81.13:45100 -
R:pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local/10.244.178.149:3181]
authenticated as BookKeeperPrincipal{ANONYMOUS}
02:39:30.527 [ReplicationWorker] INFO
org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
bookie is unknown
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException:
Bookie handle is not available
02:39:30.527 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181 as
endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
bookie does not exist or it is not running
02:39:30.527 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to
bookie:
null/pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
current state CONNECTING :
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
bookie does not exist or it is not running
at
org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:63)
~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.PerChannelBookieClient.connect(PerChannelBookieClient.java:531)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.PerChannelBookieClient.connectIfNeededAndDoOp(PerChannelBookieClient.java:657)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:121)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.obtain(DefaultPerChannelBookieClientPool.java:116)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:509)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:495)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.proto.BookieClientImpl.readEntry(BookieClientImpl.java:489)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.PendingReadOp.sendReadTo(PendingReadOp.java:576)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:405)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.read(PendingReadOp.java:386)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.PendingReadOp.initiate(PendingReadOp.java:530)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.PendingReadOp.safeRun(PendingReadOp.java:540)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
[org.apache.bookkeeper-bookkeeper-common-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.LedgerHandle.readEntriesInternalAsync(LedgerHandle.java:896)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.LedgerHandle.asyncReadEntriesInternal(LedgerHandle.java:800)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.LedgerHandle.asyncReadEntries(LedgerHandle.java:694)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.replication.ReplicationWorker.tryReadingFaultyEntries(ReplicationWorker.java:326)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:378)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:277)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:238)
[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.66.Final.jar:4.1.66.Final]
at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by:
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException:
Bookie handle is not available
at
org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:248)
~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
at
org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:43)
~[org.apache.bookkeeper-bookkeeper-server-4.14.2.jar:4.14.2]
... 22 more
02:39:30.527 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L92461 E2060 from bookie:
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181
02:39:30.528 [ReplicationWorker] INFO
org.apache.bookkeeper.client.DefaultBookieAddressResolver - Cannot resolve
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
bookie is unknown
org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException:
Bookie handle is not available
```
i notice the main error log:
ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Corrupted frame
received from bookie:
ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry
failed: L95981 E210-E210, Sent to
[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181],
Heard from [] : bitset = {}, Error = 'Bookie handle is not available'. First
unread entry is (-1, rc = null)
the ledger id metadata
```
2:49:10.484 [main] INFO
org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand -
ledgerID: 95981
02:49:10.490 [main] INFO
org.apache.bookkeeper.tools.cli.commands.client.LedgerMetaDataCommand -
LedgerMetadata{formatVersion=3, ensembleSize=3, writeQuorumSize=2,
ackQuorumSize=2, state=CLOSED, length=67946172, lastEntryId=2037,
digestType=CRC32C, password=base64:,
ensembles={0=[pulsar-cluster-bookie-3.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
pulsar-cluster-bookie-2.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181,
pulsar-cluster-bookie-4.pulsar-cluster-bookie.pulsar.svc.cluster.local:3181]},
customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=,
pulsar/managed-ledger=base64:Y29sbGVjdC9kZWZhdWx0L3BlcnNpc3RlbnQvaW9hX2xvZy1wYXJ0aXRpb24tMg==,
application=base64:cHVsc2Fy}}
```
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
6. See error
**Expected behavior**
recovery log is normally
**Screenshots**


**Desktop (please complete the following information):**
- OS: centos7
- pulsar: 2.8.1
**Additional context**
Add any other context about the problem here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]