ppawel commented on issue #36991: URL: https://github.com/apache/beam/issues/36991#issuecomment-3613571410
> [@ppawel](https://github.com/ppawel) , how fast can you learn about data loss? if you restart workers (Cloud compute restart), are you able to see those messages processed? We see data loss pretty quickly because the logic in the downstream stages is expecting those messages. They are not processe after restarting as they are not existing in Solace queue anymore. Solace does not redeliver them as the messages are acked so from Solace perspective, everything is fine and the messages are removed from the queue as usual. What we have seen is that those messages are going through SolaceIO parsing function (as we have a logging statement there) but they are gone afterwards - downstream transforms are not seeing them after those Dataflow worker warnings appear. I am not an expert on Beam/Dataflow internals, but I would expect that the work containing the missing messages should be retried but this is not happening. Between SolaceIO and the `Redistribute` transform in our pipeline there are only some trivial steps that don't have the ability to filter out the message by themselves... so that's why the current theory is that SolaceIO is not interacting correctly/fully with some pieces of the Beam/Dataflow rebalancing lifecycle. Maybe that theory is wrong and SolaceIO is behaving correctly, but honestly we are kind of grasping at straws with this issue at this point, so I wanted to check if you have some ideas from SolaceIO side. If you think this is unlikely that something inside SolaceIO would result in this behavior, then this bug can be closed, obviously I don't want to abuse this issue tracker to solve unrelated issues, and we will pursue maybe a support ticket with Google Cloud. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
