Shawyeok opened a new issue, #15390:
URL: https://github.com/apache/pulsar/issues/15390

   **Describe the bug**
   In our production, after a node restart, a topic partition keeping load 
timed out:
   ```
   2022-04-26T21:39:30.444Z ERROR [pulsar-io-4-16] 
org.apache.pulsar.broker.service.ServerCnx - [/10.xxx:56961] Failed to create 
topic persistent://data/uu/feature.plain.result.queue-partition-31, 
producerId=31
   java.util.concurrent.CompletionException: 
org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to 
load topic within timeout
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.pulsar.common.util.FutureUtil.lambda$addTimeoutHandling$1(FutureUtil.java:141)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at 
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: 
org.apache.pulsar.common.util.FutureUtil$LowOverheadTimeoutException: Failed to 
load topic within timeout
        at 
org.apache.pulsar.broker.service.BrokerService.futureWithDeadline(...)(Unknown 
Source)
   ```
   The log shows there's one cursor recovery didn't callback, it seems cursor's 
ledger recover took very long time:
   ```
   2022-04-26T21:38:30.533Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:38:30.533Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:38:30.539Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:39:30.560Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:39:30.560Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:39:30.561Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:40:30.667Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:40:30.667Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:40:30.668Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:41:30.773Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:41:30.773Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:41:30.773Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:42:30.888Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:42:30.888Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:42:30.888Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:43:31.106Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:43:31.106Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:43:31.106Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:44:31.391Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovering from 
bookkeeper ledger cursor: data.feature.c.user.deliver.and.chat.features.to.hbase
   2022-04-26T21:44:31.391Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Consumer 
data.feature.c.user.deliver.and.chat.features.to.hbase meta-data recover from 
ledger 110091138
   2022-04-26T21:44:31.391Z INFO [bookkeeper-ml-scheduler-OrderedScheduler-4-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Loading cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase
   ```
   
   And it will start a new recover op after 
`managedLedgerMetadataOperationsTimeoutSeconds`, so bookie keep logging `Ledger 
fence request received for ledger`:
   <img width="1648" alt="image" 
src="https://user-images.githubusercontent.com/5058708/165911321-a717e5e1-c161-4631-b5e2-454e3a6321e1.png";>
   
   After near 27h, the recover op finally callback (Actually the topic has 
moved to another node in this moment):
   ```
   2022-04-28T00:25:36.147Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.163Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.163Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.172Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase recovered to position 
110398886:151969
   2022-04-28T00:25:36.173Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovery for 
cursor data.feature.c.user.deliver.and.chat.features.to.hbase completed. 
pos=110398886:151969 -- todo=0
   2022-04-28T00:25:36.188Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase recovered to position 
110398886:151969
   2022-04-28T00:25:36.188Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovery for 
cursor data.feature.c.user.deliver.and.chat.features.to.hbase completed. 
pos=110398886:151969 -- todo=0
   2022-04-28T00:25:36.189Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase recovered to position 
110398886:151969
   2022-04-28T00:25:36.189Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovery for 
cursor data.feature.c.user.deliver.and.chat.features.to.hbase completed. 
pos=110398886:151969 -- todo=0
   2022-04-28T00:25:36.479Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.480Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Recovery for 
cursor data.feature.c.user.deliver.and.chat.features.to.hbase completed. 
pos=110398886:151969 -- todo=0
   2022-04-28T00:25:36.480Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Cursor 
data.feature.c.user.deliver.and.chat.features.to.hbase recovered to position 
110398886:151969
   2022-04-28T00:25:36.487Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.487Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.487Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.487Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.487Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.487Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.489Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   2022-04-28T00:25:36.489Z INFO [BookKeeperClientWorker-OrderedExecutor-2-0] 
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[data/uu/persistent/feature.plain.result.queue-partition-31] Opened ledger 
110091138 for consumer data.feature.c.user.deliver.and.chat.features.to.hbase. 
rc=0
   ```
   
   **Additional context**
   - OS: `3.10.0-1160.53.1.el7.x86_64`
   - java: `1.8.0_181`
   - pulsar: `2.8.3`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to