Re: [I] [Feature Request]: PubSubIO should allow disabling auto ack after the 1st fused stage [beam]

via GitHub Sun, 16 Feb 2025 11:13:12 -0800


xuanswe commented on issue #34000:
URL: https://github.com/apache/beam/issues/34000#issuecomment-2661572572


   > If any exception happens during the draining, the job would be stuck.
   Yes, it was stuck. I am simulating the situation again and the job cannot 
even drain :).
   
   > You probably should open a Dataflow support ticket.
   I am only simulating the situation with my personal free account.
   
   > Without knowing your Dataflow job details, it will be hard to know what 
went wrong.
   The scenario is simple: read a message from PubSub using PubSubIO then throw 
an exception after the first fused stage.
   
   The point is that, user needs to drain the job and do a lot of stuff in 
order to prevent the loss of PubSub messages.
   
   I open this ticket is to show that for some common situations, there is a 
better way.
   If the pipeline is idempotent, we don't need to care about draining at all.
   We can just disable the auto ack of PubSubIO and manually ack at the end of 
the pipeline, then the problem is solved.
   No more draining, no more worrying about how dataflow works.
   We just let it crash and deploy a new job. Don't care about the old job.
   Because the unprocessed messages are not yet acked, so we will get them 
again in the new job.
   
   So, my proposal in this ticket is to support something like 
`.withAutoAckOnSuccess(false)` in PubsubIO.
   What is your opinion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature Request]: PubSubIO should allow disabling auto ack after the 1st fused stage [beam]

Reply via email to