gaozhangmin opened a new issue, #3408:
URL: https://github.com/apache/bookkeeper/issues/3408
Our prod environment went wrong last week, all bookies were killed because
of direct memory OOM, this happened after one bookie's disk was broken, we
tried to offline this bookie. After auditBookie triggered, all the bookies
Direct Memory keep increase, it seem that, there is memory leak problem.
The ReplicateWorker log: x.x.x.x is the ip of lost bookie
```
022-07-14 21:50:41.721 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to
bookie: null/x.x.x.x:3181, current state CONNECTING :
2022-07-14 21:50:41.723 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1502 from bookie: x.x.x.x:3181
2022-07-14 21:50:41.724 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:41.724 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1506 from bookie: x.x.x.x:3181
2022-07-14 21:50:41.724 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:41.724 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1510 from bookie: x.x.x.x:3181
2022-07-14 21:50:41.725 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:41.725 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1514 from bookie: x.x.x.x:3181
2022-07-14 21:50:41.725 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:41.725 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1518 from bookie: x.x.x.x:3181
2022-07-14 21:50:41.725 [ReplicationWorker] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:42.403 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1974 from bookie: x.x.x.x:3181
2022-07-14 21:50:42.440 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1978 from bookie: x.x.x.x:3181
2022-07-14 21:50:42.525 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1982 from bookie: x.x.x.x:3181
2022-07-14 21:50:42.593 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1986 from bookie: x.x.x.x:3181
2022-07-14 21:50:42.665 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1990 from bookie: x.x.x.x:3181
2022-07-14 21:50:42.706 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1994 from bookie: x.x.x.x:3181
2022-07-14 21:50:42.776 [BookKeeperClientWorker-OrderedExecutor-8-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie handle is not
available while reading L61080496 E1998 from bookie: x.x.x.x:3181
2022-07-14 21:50:44.271 [BookKeeperClientWorker-OrderedExecutor-8-0] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:44.271 [BookKeeperClientWorker-OrderedExecutor-8-0] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:44.271 [BookKeeperClientWorker-OrderedExecutor-8-0] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:44.271 [BookKeeperClientWorker-OrderedExecutor-8-0] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:44.271 [BookKeeperClientWorker-OrderedExecutor-8-0] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 21:50:44.271 [BookKeeperClientWorker-OrderedExecutor-8-0] ERROR
org.apache.bookkeeper.proto.PerChannelBookieClient - Cannot connect to
x.x.x.x:3181 as endpoint resolution failed (probably bookie is down) err
org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException:
Cannot resolve bookieId x.x.x.x:3181, bookie does not exist or it is not running
2022-07-14 22:10:19.830 [BookKeeperClientWorker-OrderedExecutor-41-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout
while reading L60558419 E359 from bookie: 10.71.168.13:3181
2022-07-14 22:10:19.830 [BookKeeperClientWorker-OrderedExecutor-41-0] ERROR
org.apache.bookkeeper.client.LedgerFragmentReplicator - BK error reading ledger
entry: 434
2022-07-14 22:10:19.831 [BookKeeperClientWorker-OrderedExecutor-41-0] ERROR
org.apache.bookkeeper.proto.BookkeeperInternalCallbacks - Error in multi
callback : -23
is (-1, rc = null)
2022-07-14 22:10:19.830 [BookKeeperClientWorker-OrderedExecutor-41-0] INFO
org.apache.bookkeeper.client.PendingReadOp - Error: Bookie operation timeout
while reading L60558419 E378 from bookie: 1.1.1.1:3181
2022-07-14 22:10:19.830 [BookKeeperClientWorker-OrderedExecutor-41-0] ERROR
org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed:
L60558419 E378-E378, Sent to [x.x.x.x:3181, 1.1.1.1:3181], Heard from [] :
bitset = {}, Error = 'Bookie operation timeout'. First unread entry is (-1, rc
= null)
```
And there are bookies quarantined by brokers continuous, all bookies are
crashed at last.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]