ppawel commented on issue #36991:
URL: https://github.com/apache/beam/issues/36991#issuecomment-3613571410

   > [@ppawel](https://github.com/ppawel) , how fast can you learn about data 
loss? if you restart workers (Cloud compute restart), are you able to see those 
messages processed?
   
   We see data loss pretty quickly because the logic in the downstream stages 
is expecting those messages.
   
   They are not processe after restarting as they are not existing in Solace 
queue anymore. Solace does not redeliver them as the messages are acked so from 
Solace perspective, everything is fine and the messages are removed from the 
queue as usual. What we have seen is that those messages are going through 
SolaceIO parsing function (as we have a logging statement there) but they are 
gone afterwards - downstream transforms are not seeing them after those 
Dataflow worker warnings appear.
   
   I am not an expert on Beam/Dataflow internals, but I would expect that the 
work containing the missing messages should be retried but this is not 
happening. Between SolaceIO and the `Redistribute` transform in our pipeline 
there are only some trivial steps that don't have the ability to filter out the 
message by themselves... so that's why the current theory is that SolaceIO is 
not interacting correctly/fully with some pieces of the Beam/Dataflow 
rebalancing lifecycle. Maybe that theory is wrong and SolaceIO is behaving 
correctly, but honestly we are kind of grasping at straws with this issue at 
this point, so I wanted to check if you have some ideas from SolaceIO side.
   
   If you think this is unlikely that something inside SolaceIO would result in 
this behavior, then this bug can be closed, obviously I don't want to abuse 
this issue tracker to solve unrelated issues, and we will pursue maybe a 
support ticket with Google Cloud.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to