poorbarcode opened a new pull request, #4557:
URL: https://github.com/apache/bookkeeper/pull/4557
### Motivation
**Background: the normal steps of adding an entry**
- Gains BKC's IO thread
- Triggers the callback after gaining the IO thread
- Write data to servers
- Triggers `PendingAddOp.writeComplete` after receiving the response from BK
servers.
- Triggers all succeed callbacks in the pending queue
**Background: the steps of disconnection**
- Gains BKC's IO thread
- Triggers a failed callback after gaining the IO thread
- Triggers a failed `PendingAddOp.writeComplete`. You can reproduce this
flow by the new test `testAddEntriesCallbackWithBKClientThread`
- **(Highlight)** If the writing is already complete, this process will
also trigger all successful callbacks in the pending queue even if the current
writing is failed<sup>[code-1]</sup>
---
**Issue-1: write stuck due to pending add callback by multiple threads**
- Settings
- Ensembles: `3`
- WriteQuoram: `3`
- AckQuoram: `2`
| steps | write entries | `client->BK1` | `client->BK2` | `client-> BK3`|
| --- | --- | --- | --- | --- |
| 1 | write BKs |
| 2 | | start writing | start writing | start writing |
| 3 | | write success<br> `ack`: `1/3` | write success<br> `ack`: `2/3` | |
| 4 | mark the writing as `complete` since ack quorum is `2/3` |
| 5 | | | | connected and disconnected |
| 6 | | | | Triggers a failed `PendingAddOp.writeComplete` |
| 8 | | | `thread`: `bookkeeper workers` | `thread`: `client-server io` |
| 7 | | | Trigger all succeed callbacks in the pending queue | Trigger all
succeed callbacks in the pending queue<sup>[code-1]</sup> |
Since there are multiple threads that will trigger all successful callbacks
in the pending queue, it may cause the following race
condition<sup>[code-2]</sup>
- BTW, the `thread-1` and `thread-2` may be triggered by different
`PendingAddOps`
| steps | `thread-1` | `thread-2` |
| --- | --- | --- |
| 1 | peek pending addOp from the queue | peek pending addOp from the queue |
| 2 | check it is the first item: `success`| check it is the first item:
`success`|
| 3 | call `queue.pop` | call `queue.pop`|
| 4 | | Issue: the second OP will never get a callback triggering|
**[1] code link**:
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L307
```java
// PendingAddOp.writeComplete
public synchronized void writeComplete(int rc, long ledgerId, long entryId,
BookieId addr, Object ctx) {
if (completed) {
sendAddSuccessCallbacks();
maybeRecycle();
return;
}
```
**[2] code-link**:
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2092-L2124
```java
// LedgerHandle.sendAddSuccessCallbacks
void sendAddSuccessCallbacks() {
while ((pendingAddOp = pendingAddOps.peek()) != null &&
!changingEnsemble) {
if (!pendingAddOp.completed) {
return;
}
pendingAddOps.remove();
explicitLacFlushPolicy.updatePiggyBackedLac(lastAddConfirmed);
pendingAddsSequenceHead = pendingAddOp.entryId;
pendingAddOp.submitCallback(BKException.Code.OK);
}
}
```
---
**Issue-2: ledger will be closed with a incorrect length**
Since the task that triggers all successful callbacks in the pending queue
may be run in `IO` thread, the task "triggers all successful callbacks in the
pending queue" and closing ledger may concurrectly execute
| steps | `workeer-thread` | `io-thread` |
| --- | --- | --- |
| 1 | start closing ledger | peek pending addOp from the queue |
| 2 | | check it is the first item: `success`|
| 3 | drain pending adds<sup>[code-3]</sup> |
| 4 | reduce `ledger.length` which was popped out from the
queue<sup>[code-3]</sup> | |
| 5 | | call `queue.pop` and pop nothing|
| 6 | | update `ledger.LAC` |
The variables `ledger.LAC` and `ledger.length` do not match
**[3] code-link**:
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2076-L2084
```java
synchronized List<PendingAddOp> drainPendingAddsAndAdjustLength() {
PendingAddOp pendingAddOp;
List<PendingAddOp> opsDrained = new
ArrayList<PendingAddOp>(pendingAddOps.size());
while ((pendingAddOp = pendingAddOps.poll()) != null) {
addToLength(-pendingAddOp.entryLength);
opsDrained.add(pendingAddOp);
}
return opsDrained;
}
```
---
**The issue we encountered**
A pulsar topic is stuck at `ClosingLedger` state
**pulsar topic stats**
```json
{
"entriesAddedCounter" : 12485917,
"numberOfEntries" : 126383,
"totalSize" : 53668291,
"currentLedgerEntries" : 137418,
"currentLedgerSize" : 58374388,
"lastLedgerCreatedTimestamp" : "2025-02-06T09:13:45.371Z",
"waitingCursorsCount" : 1,
"pendingAddEntriesCount" : 8342123,
"lastConfirmedEntry" : "41901:126416",
"state" : "ClosingLedger",
"ledgers" : [ {
"ledgerId" : 41901,
"entries" : 0,
"size" : 0,
"offloaded" : false,
"underReplicated" : false
} ],
"cursors" : {
...
```
**logs**
```
2025-02-06T09:14:06,421+0000 [BookKeeperClientWorker-OrderedExecutor-2-0]
WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901,
126417) to bookie (2, bookie-0:3181): Bookie handle is not available"
2025-02-06T09:14:06,421+0000 [BookKeeperClientWorker-OrderedExecutor-2-0]
WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901,
126412) to bookie (2, bookie-0:3181): Bookie handle is not available"
2025-02-06T09:14:06,421+0000 [pulsar-io-3-5] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to
bookie: bookie-0:3181 [id: 0xbc663b9b, L:/10.247.4.112:59578 -
R:bookie-0/10.247.4.38:3181]"
2025-02-06T09:14:06,421+0000 [pulsar-io-3-5] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id:
0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181] authenticated
as BookKeeperPrincipal{ANONYMOUS}"
2025-02-06T09:14:06,423+0000 [BookKeeperClientWorker-OrderedExecutor-2-0]
INFO org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [bookie-3:3181,
bookie-2:3181, bookie-1:3181] for ledger: 41901"
2025-02-06T09:14:06,423+0000 [pulsar-io-3-5] WARN
org.apache.bookkeeper.proto.PerChannelBookieClient - Exception caught on:[id:
0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181] cause:
recvAddress(..) failed: Connection reset by peer"
2025-02-06T09:14:06,423+0000 [pulsar-io-3-5] INFO
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie
channel [id: 0xbc663b9b, L:/10.247.4.112:59578 ! R:bookie-0/10.247.4.38:3181]"
2025-02-06T09:14:06,423+0000 [pulsar-io-3-6] WARN
org.apache.bookkeeper.mledger.impl.OpAddEntry - [{topic}] The add op is
terminal legacy callback for entry 41901-126416 adding."
2025-02-06T09:14:06,423+0000 [pulsar-io-3-6] WARN
org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901,
126426) to bookie (2, bookie-1:3181): Bookie handle is not available"
2025-02-06T09:14:06,423+0000 [BookKeeperClientWorker-OrderedExecutor-3-0]
WARN org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for
bookie: bookie-1:3181 in ensemble: [bookie-3:3181, bookie-2:3181,
bookie-1:3181] is not adhering to placement policy and chose bookie-0:3181.
excludedBookies [bookie-1:3181] and quarantinedBookies [bookie-2:3181]"
```
### Changes
Switch the thread to `Bookkeeper works` if the connection is broken.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]