omkar-foss commented on issue #40841:
URL: https://github.com/apache/airflow/issues/40841#issuecomment-2255245212

   Hey @shai-ikko, good to hear from you. Apologies on my delayed response here.
   
   So in [this PR](https://github.com/apache/airflow/pull/40963), the check 
I've added - to check if last dagrun is still running - helps to resolve the 
issue where multiple dagruns are spawned and more than one of them try to 
process the same file. The issue occurs because the each task instances of each 
dagrun call the `_get_dep_statuses()` which returns a passing status and the 
task instances in than one dagruns try to process the same file (and only one 
of them succeeds, others fail).
   
   I've intentionally not touched other checks below it, as those seem to be 
added to handle specific cases. For example, the check you mentioned - `if not 
self._has_tis(last_dagrun, ti.task_id, session=session)` - **I'd tried moving 
this check above the `last_dagrun.execution_date < ti.task.start_date` check 
without any other changes, and it results in blocking new dagruns from getting 
scheduled after the current ones are complete** (see screenshot below).
   
   ![Screenshot at 2024-07-29 
12-30-06](https://github.com/user-attachments/assets/ffcdb066-9c61-46bd-9590-ead67fd70b2c)
   
   Tagging two of the pros I'm aware of, who might be able to help here - @ashb 
@uranusjr please help with these couple of things, if possible:
   
   1. Kindly help verify if [this 
PR](https://github.com/apache/airflow/pull/40963) will suffice to resolve this 
GitHub issue, or if some other changes would be required.
   2. Kindly share if you happen to have any context about the reasoning behind 
updating `task.start_date`, as [this 
check](https://github.com/apache/airflow/blob/main/airflow/ti_deps/deps/prev_dagrun_dep.py#L159-L163)
 returns `True` for more than one task instances in parallel DagRuns, causing 
them to process the same file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to