kzosabe commented on code in PR #34771:
URL: https://github.com/apache/airflow/pull/34771#discussion_r1542257954
##########
airflow/jobs/scheduler_job_runner.py:
##########
@@ -618,6 +618,11 @@ def _executable_task_instances_to_queued(self, max_tis:
int, session: Session) -
)
for ti in executable_tis:
+ # ti.start_date could be None when the scheduler queue a TI
+ # or when the backfill CLI send a TI to the executor
+ # in this case set it at this line because
emit_state_change_metric doesn't expect it
Review Comment:
> When can start_date not be None?
This call only occurs when the TI.state transitions from scheduled to
queued, so normally start_date is None.
One exception that I and hussein-awala mentioned was backfill.
Currently, when set_state occurs, ti.start_date stores the value regardless
of state.
https://github.com/apache/airflow/blob/5c7b3e9fa7e9a46044f02ef7a31ebc0344cfb816/airflow/models/taskinstance.py#L1885
In backfill, set_state is used to rewind the state to scheduled, resulting
in a ti of scheduled where start_date is stored.
https://github.com/apache/airflow/blob/77341ef6a1e4ffa3f8d3275eade325c89f2c95f2/airflow/jobs/backfill_job_runner.py#L428
This is an implementation error in set_state, as it fails to take into
account the need to revert to the pre-running state.
Except for the bug mentioned above, the log was implemented incorrectly from
the beginning, since in practice it is not inherently possible for start_date
not to be None at this location.
As mentioned, it should not be possible to implement an equivalent log
correctly unless a field like scheduled_dttm is implemented.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]