reuvenlax commented on PR #27495: URL: https://github.com/apache/beam/pull/27495#issuecomment-1635657717
What exactly was the failure that got retried 1000 times here? On Fri, Jul 14, 2023 at 11:27 AM Ahmed Abualsaud ***@***.***> wrote: > @reuvenlax <https://github.com/reuvenlax> I think the current state also > leaves us vulnerable to large tail latency. We've seen a case where a work > item (due to other issues) gets duplicated across workers. One worker > succeeded and the other kept failing here. It retried all of 1000 times > before failing the work item (after that Dataflow didn't retry because the > work item was processed by another worker). Overall, this one work item > took ~4 hours. > > Probably would be good to find a balance for the appropriate number of > retries. > > What testing has been done here? > > What kind of testing would you like to see? The retry functionality hasn't > really changed, just reducing the number of retries. > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/beam/pull/27495#issuecomment-1635654393>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AFAYJVOFEYEHNWL6T4AMELDXQENIZANCNFSM6AAAAAA2JSTZ3A> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
