ppawel commented on PR #37164: URL: https://github.com/apache/beam/pull/37164#issuecomment-3728086444
@stankiewicz I tested this new approach and I am still able to reproduce the original problem (lost messages) in Dataflow. It is quite easy to trigger it by adding/removing workers in runtime as messages are coming by using `gcloud dataflow jobs update-options --min/max-num-workers`. If I lock workers to 1 then lost messages happen less frequently but still happen (see my original report). The logs are the same as my original report, e.g.: `Work is no longer active on the backend, it already succeeded or will be retried. sharding_key=e101de22c0c8af04 status: INTERNAL: Windmill failed to commit the work item. CommitStatus: NOT_FOUND` There are no additional logs related to NACKing or deadline exceeded etc. Maybe it would be useful to add some logs in this branch so it is visible that the new logic is being executed. One question regarding the exact code version to test - I took release 2.70.0 tag from Git and applied your commit `3149f96c728235a63da38c76c94f76197ad716ea` on top of it and this is what I tested. Is that the correct state to test? Just want to confirm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
