Hello, What happens when a task in a running job fails? Will all the current executions of the job's tasks fail? Will all the slots being used by the job tasks (failed and non-failed ones) be released.
Assuming all the slots are released, wouldn’t it make sense to: 1. “stop” the non-failed tasks and keep them holding their slots. Where “stop” may mean something like stop processing tuples from the input streams. 2. Re-schedule the failed task to a new slot (imagine the task failed because the Task Manager owning that slot failed) 3. Recover state to the last snapshot. 4. “re-start”. Where “re-start” means start processing tuples from the input streams. Thanks, Luís Alves