devinbost edited a comment on issue #6054:
URL: https://github.com/apache/pulsar/issues/6054#issuecomment-800868920
I created a test cluster (on fast hardware) specifically for reproducing
this issue. In our very simple function flow, using simple Java functions
without external dependencies, on Pulsar **2.6.3**, as soon as we started
flowing data (around 4k msg/sec at 140KB/msg average), within seconds the bug
appeared (as expected), blocking up the flow, and causing a backlog to
accumulate.
I looked up the broker the frozen topic was running on and got heap dumps
and thread dumps of the broker and functions running on that broker.
There was nothing abnormal in the thread dumps that I could find. However,
the topic stats and internal stats seem to have some clues.
I've attached the topic stats and internal topic stats of the topic upstream
from the frozen topic, the frozen topic, and the topic immediately downstream
from the frozen topic.
The flow looks like this:
-> `first-function` -> `first-topic` -> `second-function` -> `second-topic`
-> `third-function` -> `third-topic` -> fourth-function ->
The `second-topic` is the one that froze and started accumulating backlog.
The `second-topic` reports -45 available permits for its consumer,
`third-function`.
All three topics report 0 pendingReadOps.
`first-topic` and `third-topic` have waitingReadOp = true, indicating the
subscriptions are waiting for messages.
`second-topic` has waitingReadOp = false, indicating its subscription hasn't
caught up or isn't waiting for messages.
`second-topic` reports waitingCursorsCount = 0, so it has no cursors waiting
for messages.
`third-topic` has pendingAddEntriesCount = 81, indicating it's waiting for
write requests to complete.
`first-topic` and `second-topic` have pendingAddEntriesCount = 0
`third-topic` is in the state ClosingLedger.
`first-topic` and `second-topic` are in the state LedgerOpened
`second-topic`'s cursor has markDeletePosition = 17525:0 and `readPosition`
= 17594:9
`third-topic`'s cursor has markDeletePosition = 17551:9 and
`readPosition` = 17551:10
So, the `third-topic`'s cursor's `readPosition` is adjacent to its
markDeletePosition.
However, `second-topic`'s cursor's `readPosition` is farther ahead than
`third-topic`'s `readPosition`.
Is that unusual for a downstream topic's cursor to have a `readPosition`
farther ahead (larger number) than the `readPosition` of the topic immediately
upstream from it when the downstream topic's only source of messages is that
upstream topic and not more than a few hundred thousand messages have been sent
through the pipe?
It's silly... Github won't let me attach .json files, so I had to make them
txt files. In the attached zip, they have the .json extension for convenience.
[first-topic-internal-stats.json.txt](https://github.com/apache/pulsar/files/6154489/first-topic-internal-stats.json.txt)
[first-topic-stats.json.txt](https://github.com/apache/pulsar/files/6154490/first-topic-stats.json.txt)
[second-topic-internal-stats.json.txt](https://github.com/apache/pulsar/files/6154491/second-topic-internal-stats.json.txt)
[second-topic-stats.json.txt](https://github.com/apache/pulsar/files/6154492/second-topic-stats.json.txt)
[third-topic-internal-stats.json.txt](https://github.com/apache/pulsar/files/6154493/third-topic-internal-stats.json.txt)
[third-topic-stats.json.txt](https://github.com/apache/pulsar/files/6154494/third-topic-stats.json.txt)
[stats.zip](https://github.com/apache/pulsar/files/6154471/stats.zip)
I've also attached the broker thread dump:
[thread_dump_3-16.txt](https://github.com/apache/pulsar/files/6154506/thread_dump_3-16.txt)
Regarding the heap dump, I can't attach that until I'm able to reproduce
this bug with synthetic data, but in the meantime, if anyone wants me to look
up specific things in the heap dump, I'll be happy to do that.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]