hozumi commented on issue #6163:
URL: https://github.com/apache/pulsar/issues/6163#issuecomment-717981358
I'm hitting the same problem right now on a production cluster of 5 bookie
node.
Is there a way to speed up the rereplicatation process?
- decommissionbookie command log
```
$ docker exec -it pulsar_bookkeeper bin/bookkeeper shell decommissionbookie
-bookieid 'myoldbookienode1:3181'
....
07:50:23.151 [main-SendThread(mybookienode5:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
mybookienode5/10.12.9.5:2181. Will not attempt to authenticate using SASL
(unknown error)
07:50:23.160 [main-SendThread(mybookienode5:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established, initiating
session, client: /10.12.9.1:49356, server: mybookienode5/10.12.9.5:2181
07:50:23.233 [main-SendThread(mybookienode5:2181)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
mybookienode5/10.12.9.5:2181, sessionid = 0x5017fae8c900878, negotiated timeout
= 30000
07:50:23.237 [main-EventThread] INFO
org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is
connected now.
07:50:23.611 [main] ERROR
org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to
initialize DNS Resolver org.apache.bookkeeper.net.ScriptBasedMapping, used
default subnet resolver : java.lang.RuntimeException: No network topology
script is found when using script based DNS resolver.
07:50:23.662 [main] INFO
org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Initialize
rackaware ensemble placement policy @ <Bookie:10.12.9.1:0> @ /default-rack :
org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy$DefaultResolver.
07:50:23.662 [main] INFO
org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Not weighted
07:50:23.668 [main] INFO org.apache.bookkeeper.client.BookKeeper - Weighted
ledger placement is not enabled
07:50:23.719 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO
org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node:
/default-rack/mybookienode3:3181
07:50:23.720 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO
org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node:
/default-rack/mybookienode1:3181
07:50:23.720 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO
org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node:
/default-rack/mybookienode2:3181
07:50:23.720 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO
org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node:
/default-rack/mybookienode5:3181
07:50:24.146 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Resetting LostBookieRecoveryDelay value: 0, to kickstart audit task
07:51:28.159 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 144366
08:48:56.814 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 144217
09:33:14.303 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143970
10:28:59.324 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143713
10:43:45.331 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143517
10:55:02.345 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143456
11:06:15.621 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143416
11:17:31.914 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143367
11:28:46.275 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143308
11:40:01.941 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143246
11:51:15.598 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143187
12:02:28.513 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143131
12:13:44.562 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 143066
12:19:25.577 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO
org.apache.bookkeeper.net.NetworkTopologyImpl - Removing a node:
/default-rack/mybookienode2:3181
12:21:01.745 [BookKeeperClientScheduler-OrderedScheduler-0-0] INFO
org.apache.bookkeeper.net.NetworkTopologyImpl - Adding a new node:
/default-rack/mybookienode2:3181
13:08:53.974 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 142926
14:10:06.933 [main] INFO org.apache.bookkeeper.client.BookKeeperAdmin -
Count of Ledgers which need to be rereplicated: 142640
```
- a bookie log
```
14:28:03.608 [BookKeeperClientWorker-OrderedExecutor-23-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L51031 E18397 from bookie: myoldbookienode1:3181
```
```
14:28:03.682 [bookkeeper-io-46-31] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to
bookie: [id: 0xf237a093, L:/10.12.9.1:10898]/myoldbookienode1:3181, current
state CONNECTING :
io.netty.channel.AbstractChannel$AnnotatedConnectException:
finishConnect(..) failed: Connection refused: myoldbookienode1/10.12.9.1:3181
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection
refused
at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
~[io.netty-netty-transport-native-unix-common-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
~[io.netty-netty-transport-native-unix-common-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672)
[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649)
[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529)
[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
[io.netty-netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:4.1.48.Final]
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
[io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
at
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
[io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
```
Pulsar docker image version: `apachepulsar/pulsar-all:2.5.2`
BookKeeper docker image version: `apachepulsar/pulsar-all:2.5.2`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]