pnowojski commented on pull request #17796:
URL: https://github.com/apache/flink/pull/17796#issuecomment-976720425


   I think the scenario that you posted works fine. After `endInput()` in step 
2. there shouldn't be any timers registered left, as they are being cancelled 
in `AsyncWaitOperator.ResultHandler#processResults`. And  
`AsyncWaitOperator.ResultHandler#completed` seems like is preventing from a 
race between firing and cancelling the timer at the same time.
   
   Also if you take a look at the reported stacktrace, it's coming from 
`processResults`, so this is not a timeout trigger. Nevertheless, we should 
also take the timeout case into consideration.
   
   As far as I understand, the problem described in the ticket applies to 
`stop-with-savepoint --without-drain`, where `endInput()` is NOT being called. 
   
   In that scenario, maybe we don't want to duplicate the `endInput()` logic, 
as this would unnecessarily delay the `stop-with-savepoint --without-drain`, 
while having no meaningful result for the user? Those records would be 
re-processed by the async function either way after recovering from such 
savepoint. 
   
   So it looks like we might have two paths to take (in `finish()`?):
   1. wait for the results to be processed/timeout
   2. cancel those results
   
   Another question, is if there is the same issue for regular cancelation? 
   
   And yet another thing to take into consideration, that with FLIP-147 follow 
up, we wanted to unify stop-with-savepoint with and without drain. Having this 
in mind might affect our plans to address this bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to