ashb commented on code in PR #23846:
URL: https://github.com/apache/airflow/pull/23846#discussion_r890269996


##########
airflow/jobs/scheduler_job.py:
##########
@@ -664,7 +663,20 @@ def _process_executor_events(self, session: Session = 
None) -> int:
                 ti.pid,
             )
 
-            if ti.try_number == buffer_key.try_number and ti.state == 
State.QUEUED:
+            # There are two scenarios why the same TI with the same try_number 
is queued
+            # after executor is finished with it:
+            # 1) the TI was killed externally and it had no time to mark 
itself failed
+            # - in this case we should mark it as failed here.
+            # 2) the TI has been requeued after getting deferred - in this 
case either our executor has it
+            # or the TI is queued by another job. Either ways we should not 
fail it.
+
+            # All of this could also happen if the state is "running",
+            # but that is handled by the zombie detection.
+
+            ti_queued = ti.try_number == buffer_key.try_number and ti.state == 
TaskInstanceState.QUEUED
+            ti_requeued = ti.queued_by_job_id != self.id or 
self.executor.has_task(ti)

Review Comment:
   There is also a `queued_dttm` column -- is it worth checking that value is 
"recent"? (I don't know the answer, just asking questions)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to