pnowojski commented on pull request #17796: URL: https://github.com/apache/flink/pull/17796#issuecomment-976720425
I think the scenario that you posted works fine. After `endInput()` in step 2. there shouldn't be any timers registered left, as they are being cancelled in `AsyncWaitOperator.ResultHandler#processResults`. And `AsyncWaitOperator.ResultHandler#completed` seems like is preventing from a race between firing and cancelling the timer at the same time. Also if you take a look at the reported stacktrace, it's coming from `processResults`, so this is not a timeout trigger. Nevertheless, we should also take the timeout case into consideration. As far as I understand, the problem described in the ticket applies to `stop-with-savepoint --without-drain`, where `endInput()` is NOT being called. In that scenario, maybe we don't want to duplicate the `endInput()` logic, as this would unnecessarily delay the `stop-with-savepoint --without-drain`, while having no meaningful result for the user? Those records would be re-processed by the async function either way after recovering from such savepoint. So it looks like we might have two paths to take (in `finish()`?): 1. wait for the results to be processed/timeout 2. cancel those results Another question, is if there is the same issue for regular cancelation? And yet another thing to take into consideration, that with FLIP-147 follow up, we wanted to unify stop-with-savepoint with and without drain. Having this in mind might affect our plans to address this bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
