diegovallone opened a new issue, #2760:
URL: https://github.com/apache/pekko/issues/2760

   When using Reliable Delivery with a standard `ProducerController` backed by 
a durable queue (e.g., `EventSourcedProducerQueue`), restarting the producer 
causes an immediate crash if all previous messages were confirmed before the 
restart.
   
   The `ProducerController` successfully reloads its state but crashes with the 
following error before a `ConsumerController` can even establish demand:
   `java.lang.IllegalStateException: Unexpected Msg when no demand, requested 
true, requestedSeqNr 1, currentSeqNr X` (where `X` is the actual restored 
sequence number).
   
   Note: The `WorkPullingProducerController` handles state initialization 
differently and is unaffected by this issue.
   
   **Steps to Reproduce:**
   I have verified this behavior using an isolated test with Pekko's in-memory 
journal.
   
   1. Start a `ProducerController` with a durable queue and a 
`ConsumerController`.
   2. Send a message, and allow the consumer to receive and confirm it 
(clearing the unconfirmed buffer).
   3. Wait for the `ProducerController` to receive the next `RequestNext`, 
ensuring the confirmed state is fully written to the durable queue.
   4. Stop and restart the ProducerController to simulate a restart, crash, or 
deployment.
   5. The framework emits a `RequestNext` with `seqNr = 1` instead of the 
restored sequence number. When the producer supplies the next message, it fails 
the internal demand check and crashes with an `IllegalStateException`.
   
   (See the attached **ProducerControllerBugTest.scala** snippet below for the 
fully reproducible test case. Note that the test is inverted: it succeeds if it 
can reproduce the bug).
   
[ProducerControllerBugTest.txt](https://github.com/user-attachments/files/26105216/ProducerControllerBugTest.txt)
   
   **Root Cause:**
   In **ProducerControllerImpl.scala**, the state recovery logic ignores the 
loaded sequence number when initializing the demand window and requesting the 
next message from the local producer:
   
   In `createState`, `requestedSeqNr` is hardcoded to `1L` instead of adopting 
`loadedState.currentSeqNr`.
   
   In `becomeActive`, if `state.unconfirmed.isEmpty` is `true`, it hardcodes 
`1L` and `0L` into the `RequestNext` message and the flight recorder, rather 
than using `state.currentSeqNr` and `state.confirmedSeqNr`.
   
   (See the attached **bugfix.txt** which contains a `git diff` of the fix that 
apparently solves this issue)
   [bugfix.txt](https://github.com/user-attachments/files/26105226/bugfix.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to