FloChehab edited a comment on pull request #10230:
URL: https://github.com/apache/airflow/pull/10230#issuecomment-679274286


   Hi @dimberman,
   
   I was doing more airflow testing and I think this PR also addresses this 
issue https://github.com/apache/airflow/issues/10325 (I was having on older 
Airflow version). Which is pretty great (we had issues in production with this 
the other day) !
   
   Unfortunately, I still can experience issues with the KubernetesPodOperator 
(with the latest 1.10.12rc):
   
   1.
   * Process: start airflow, trigger the dag with KubernetesPodOperator, kill 
everything except the pod with the task, wait for the task to complete (status 
`Completed` on the kubernetes API),
   * When the scheduler is restarted, the task seems to be stuck in 
"up_for_retry" ; if I restart the scheduler again, then it is marked as success 
(`[scheduler] [2020-08-24 17:36:16,190] {base_executor.py:157} DEBUG - Changing 
state: ('bug_kuberntes_pod_operator', 'task', datetime.datetime(2020, 8, 24, 
17, 27, 49, 493579, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 2)`). 
Weird ; this seems to be 100% reproducible (I've tried three times).
   
   2.
   * Also, I've experienced a situation (still with the latest 1.10.12rc), 
where the task would be marked as "completed" on kubernetes side and "running" 
(at least for 30+ mins ; afterwards it was marked as failed) on airflow side 
(without scheduler restarts, if I remember correctly ; but that's why I found 
the issue above in the first place) ; I haven't reproduced it yet. Could there 
be weird edge cases where this could happen ? (I am working with the latest 
chart + celery executor + keda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to