Re: [PR] Better map_index and downstream/upstream tasks handling [airflow]

via GitHub Tue, 03 Sep 2024 01:52:53 -0700


uranusjr commented on code in PR #41384:
URL: https://github.com/apache/airflow/pull/41384#discussion_r1741682178



##########
tests/models/test_taskinstance.py:
##########
@@ -5312,3 +5312,65 @@ def 
test_swallow_mini_scheduler_exceptions(_schedule_downstream_mock, create_tas
     ti.schedule_downstream_tasks()
     assert "Error scheduling downstream tasks." in caplog.text
     assert "To be swallowed" in caplog.text
+
+
+def test_ti_selector_condition(dag_maker):
+    from airflow.utils.timezone import datetime
+
+    clear_db_runs()

Review Comment:
   Better to also use a fixture here. `clean_dags_and_dagruns` would be useful.



##########
tests/models/test_taskinstance.py:
##########
@@ -5312,3 +5312,65 @@ def 
test_swallow_mini_scheduler_exceptions(_schedule_downstream_mock, create_tas
     ti.schedule_downstream_tasks()
     assert "Error scheduling downstream tasks." in caplog.text
     assert "To be swallowed" in caplog.text
+
+
+def test_ti_selector_condition(dag_maker):
+    from airflow.utils.timezone import datetime
+
+    clear_db_runs()
+    files = ["a", "b", "c"]
+
+    start_date = datetime(2024, 1, 1)
+    files = ["file1", "file2", "file3"]
+
+    with dag_maker(dag_id="task_group_mapping_example", start_date=start_date, 
schedule=None, catchup=False):
+
+        @task_group(group_id="etl")
+        def etl_pipeline(file):
+            e = EmptyOperator(task_id="e")
+            t = EmptyOperator(task_id="t")
+            last = EmptyOperator(task_id="last")
+
+            e >> t >> last
+
+        etl_pipeline.expand(file=files)
+
+    dag_instance = dag_maker.dag
+    dag_maker.create_dagrun(
+        run_id="manual_run_2024_01_01",
+        state=State.SUCCESS,
+        execution_date=start_date,
+        start_date=start_date,
+        data_interval=(start_date, start_date),
+        run_type=DagRunType.SCHEDULED,
+    )
+
+    # with map_index
+    task_id = "etl.e"
+    task_id_or_regex = [task_id]
+    map_indexes = [0, 1]
+    task_ids = [(task_id, map_index) for map_index in map_indexes]
+    partial_dag = dag_instance.partial_subset(
+        task_ids_or_regex=task_id_or_regex,
+        include_downstream=True,
+        include_upstream=False,
+    )
+
+    # handling downstream tasks
+    if len(partial_dag.task_dict) > 1:
+        task_ids.extend(tid for tid in partial_dag.task_dict if tid != task_id)
+
+    with create_session() as session:

Review Comment:
   There’s also a `session` fixture. (There’s actually also one on `dag_maker` 
too.)



##########
tests/models/test_taskinstance.py:
##########
@@ -5312,3 +5312,65 @@ def 
test_swallow_mini_scheduler_exceptions(_schedule_downstream_mock, create_tas
     ti.schedule_downstream_tasks()
     assert "Error scheduling downstream tasks." in caplog.text
     assert "To be swallowed" in caplog.text
+
+
+def test_ti_selector_condition(dag_maker):
+    from airflow.utils.timezone import datetime
+
+    clear_db_runs()
+    files = ["a", "b", "c"]
+
+    start_date = datetime(2024, 1, 1)
+    files = ["file1", "file2", "file3"]
+
+    with dag_maker(dag_id="task_group_mapping_example", start_date=start_date, 
schedule=None, catchup=False):
+
+        @task_group(group_id="etl")
+        def etl_pipeline(file):
+            e = EmptyOperator(task_id="e")
+            t = EmptyOperator(task_id="t")
+            last = EmptyOperator(task_id="last")
+
+            e >> t >> last
+
+        etl_pipeline.expand(file=files)
+
+    dag_instance = dag_maker.dag
+    dag_maker.create_dagrun(
+        run_id="manual_run_2024_01_01",
+        state=State.SUCCESS,
+        execution_date=start_date,
+        start_date=start_date,
+        data_interval=(start_date, start_date),
+        run_type=DagRunType.SCHEDULED,
+    )
+
+    # with map_index
+    task_id = "etl.e"
+    task_id_or_regex = [task_id]
+    map_indexes = [0, 1]
+    task_ids = [(task_id, map_index) for map_index in map_indexes]
+    partial_dag = dag_instance.partial_subset(
+        task_ids_or_regex=task_id_or_regex,
+        include_downstream=True,
+        include_upstream=False,
+    )
+
+    # handling downstream tasks
+    if len(partial_dag.task_dict) > 1:
+        task_ids.extend(tid for tid in partial_dag.task_dict if tid != task_id)

Review Comment:
   There’s only exactly one case in this test… eitehr we need the `extend` call 
or we don’t, there’s no need for the `if` condition.



##########
tests/models/test_taskinstance.py:
##########
@@ -5312,3 +5312,65 @@ def 
test_swallow_mini_scheduler_exceptions(_schedule_downstream_mock, create_tas
     ti.schedule_downstream_tasks()
     assert "Error scheduling downstream tasks." in caplog.text
     assert "To be swallowed" in caplog.text
+
+
+def test_ti_selector_condition(dag_maker):
+    from airflow.utils.timezone import datetime
+
+    clear_db_runs()
+    files = ["a", "b", "c"]
+
+    start_date = datetime(2024, 1, 1)
+    files = ["file1", "file2", "file3"]
+
+    with dag_maker(dag_id="task_group_mapping_example", start_date=start_date, 
schedule=None, catchup=False):
+
+        @task_group(group_id="etl")
+        def etl_pipeline(file):
+            e = EmptyOperator(task_id="e")
+            t = EmptyOperator(task_id="t")
+            last = EmptyOperator(task_id="last")
+
+            e >> t >> last
+
+        etl_pipeline.expand(file=files)
+
+    dag_instance = dag_maker.dag
+    dag_maker.create_dagrun(
+        run_id="manual_run_2024_01_01",
+        state=State.SUCCESS,
+        execution_date=start_date,
+        start_date=start_date,
+        data_interval=(start_date, start_date),
+        run_type=DagRunType.SCHEDULED,
+    )

Review Comment:
   Manual or scheduled? (It doesn’t really matter in this test but it’s just 
weird.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Better map_index and downstream/upstream tasks handling [airflow]

Reply via email to