philipherrmann commented on issue #12111:
URL: https://github.com/apache/airflow/issues/12111#issuecomment-734829587


   I think this is because of the 
   `reattach_on_restart`idea. The parameter is documented as "if the scheduler 
dies while the pod is running, reattach and monitor", but, I think, "while the 
pod is running" seems to checked in an incorrect manner. It seems that the 
Operator searches the pod's metadata for an "already_checked" label with value 
True. This label seems to be set only in the "call stack" 
   ```
   handle_pod_overlap > monitor_launched_pod > final_state != State.SUCCESS
   ```
   This might be intended, in terms of idempotency - unclear to me. At least it 
unexpected to me. 
   
   Possible workarounds are setting `is_delete_operator_pod = True` or 
`reattach_on_restart = False` when construction the Operator. While both 
settings allow retries, they have different side effects.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to