omkar-foss commented on issue #40841: URL: https://github.com/apache/airflow/issues/40841#issuecomment-2255245212
Hey @shai-ikko, good to hear from you. Apologies on my delayed response here. So in [this PR](https://github.com/apache/airflow/pull/40963), the check I've added - to check if last dagrun is still running - helps to resolve the issue where multiple dagruns are spawned and more than one of them try to process the same file. The issue occurs because the each task instances of each dagrun call the `_get_dep_statuses()` which returns a passing status and the task instances in than one dagruns try to process the same file (and only one of them succeeds, others fail). I've intentionally not touched other checks below it, as those seem to be added to handle specific cases. For example, the check you mentioned - `if not self._has_tis(last_dagrun, ti.task_id, session=session)` - **I'd tried moving this check above the `last_dagrun.execution_date < ti.task.start_date` check without any other changes, and it results in blocking new dagruns from getting scheduled after the current ones are complete** (see screenshot below).  Tagging two of the pros I'm aware of, who might be able to help here - @ashb @uranusjr please help with these couple of things, if possible: 1. Kindly help verify if [this PR](https://github.com/apache/airflow/pull/40963) will suffice to resolve this GitHub issue, or if some other changes would be required. 2. Kindly share if you happen to have any context about the reasoning behind updating `task.start_date`, as [this check](https://github.com/apache/airflow/blob/main/airflow/ti_deps/deps/prev_dagrun_dep.py#L159-L163) returns `True` for more than one task instances in parallel DagRuns, causing them to process the same file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
