[
https://issues.apache.org/jira/browse/AIRFLOW-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17162067#comment-17162067
]
ASF GitHub Bot commented on AIRFLOW-5391:
-----------------------------------------
kaxil commented on a change in pull request #8992:
URL: https://github.com/apache/airflow/pull/8992#discussion_r458137872
##########
File path: tests/ti_deps/deps/test_not_previously_skipped_dep.py
##########
@@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pendulum
+
+from airflow.models import DAG, TaskInstance
+from airflow.operators.dummy_operator import DummyOperator
+from airflow.operators.python_operator import BranchPythonOperator
+from airflow.ti_deps.dep_context import DepContext
+from airflow.ti_deps.deps.not_previously_skipped_dep import
NotPreviouslySkippedDep
+from airflow.utils.db import create_session
+from airflow.utils.state import State
+
+
+def test_no_parent():
+ """
+ A simple DAG with a single task. NotPreviouslySkippedDep is met.
+ """
+ start_date = pendulum.datetime(2020, 1, 1)
+ dag = DAG("test_test_no_parent_dag", schedule_interval=None,
start_date=start_date)
+ op1 = DummyOperator(task_id="op1", dag=dag)
+
+ ti1 = TaskInstance(op1, start_date)
+
+ with create_session() as session:
+ dep = NotPreviouslySkippedDep()
+ assert len(list(dep.get_dep_statuses(ti1, session, DepContext()))) == 0
+ assert dep.is_met(ti1, session)
+ assert ti1.state != State.SKIPPED
+
+
+def test_no_skipmixin_parent():
+ """
+ A simple DAG with no branching. Both op1 and op2 are DummyOperator.
NotPreviouslySkippedDep is met.
+ """
+ start_date = pendulum.datetime(2020, 1, 1)
+ dag = DAG(
+ "test_no_skipmixin_parent_dag", schedule_interval=None,
start_date=start_date
+ )
+ op1 = DummyOperator(task_id="op1", dag=dag)
+ op2 = DummyOperator(task_id="op2", dag=dag)
+ op1 >> op2
+
+ ti2 = TaskInstance(op2, start_date)
+
+ with create_session() as session:
+ dep = NotPreviouslySkippedDep()
+ assert len(list(dep.get_dep_statuses(ti2, session, DepContext()))) == 0
+ assert dep.is_met(ti2, session)
+ assert ti2.state != State.SKIPPED
+
+
+def test_parent_follow_branch():
+ """
+ A simple DAG with a BranchPythonOperator that follows op2.
NotPreviouslySkippedDep is met.
+ """
+ start_date = pendulum.datetime(2020, 1, 1)
+ dag = DAG(
+ "test_parent_follow_branch_dag", schedule_interval=None,
start_date=start_date
+ )
+ op1 = BranchPythonOperator(task_id="op1", python_callable=lambda: "op2",
dag=dag)
+ op2 = DummyOperator(task_id="op2", dag=dag)
+ op1 >> op2
+
+ TaskInstance(op1, start_date).run()
+ ti2 = TaskInstance(op2, start_date)
+
+ with create_session() as session:
+ dep = NotPreviouslySkippedDep()
+ assert len(list(dep.get_dep_statuses(ti2, session, DepContext()))) == 0
+ assert dep.is_met(ti2, session)
+ assert ti2.state != State.SKIPPED
+
+
+def test_parent_skip_branch():
+ """
+ A simple DAG with a BranchPythonOperator that does not follow op2.
NotPreviouslySkippedDep is not met.
+ """
+ start_date = pendulum.datetime(2020, 1, 1)
+ dag = DAG(
+ "test_parent_skip_branch_dag", schedule_interval=None,
start_date=start_date
+ )
+ op1 = BranchPythonOperator(task_id="op1", python_callable=lambda: "op3",
dag=dag)
+ op2 = DummyOperator(task_id="op2", dag=dag)
+ op3 = DummyOperator(task_id="op3", dag=dag)
+ op1 >> [op2, op3]
+
+ TaskInstance(op1, start_date).run()
+ ti2 = TaskInstance(op2, start_date)
+
+ with create_session() as session:
+ dep = NotPreviouslySkippedDep()
+ assert len(list(dep.get_dep_statuses(ti2, session, DepContext()))) == 1
+ assert not dep.is_met(ti2, session)
+ assert ti2.state == State.SKIPPED
Review comment:
`NotPreviouslySkippedDep` is a confusing term for this case. The task is
about to be skipped in this DagRun but if this task ran in the previous DagRun,
`NotPreviouslySkippedDep` should be True based on the name of the dependency.
Although it would be `False` based on the implementation.
Is my interpretation correct @yuqian90 ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Clearing a task skipped by BranchPythonOperator will cause the task to execute
> ------------------------------------------------------------------------------
>
> Key: AIRFLOW-5391
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5391
> Project: Apache Airflow
> Issue Type: Bug
> Components: operators
> Affects Versions: 1.10.4
> Reporter: Qian Yu
> Assignee: Qian Yu
> Priority: Major
> Fix For: 2.0.0
>
>
> I tried this on 1.10.3 and 1.10.4, both have this issue:
> E.g. in this example from the doc, branch_a executed, branch_false was
> skipped because of branching condition. However if someone Clear
> branch_false, it'll cause branch_false to execute.
> !https://airflow.apache.org/_images/branch_good.png!
> This behaviour is understandable given how BranchPythonOperator is
> implemented. BranchPythonOperator does not store its decision anywhere. It
> skips its own downstream tasks in the branch at runtime. So there's currently
> no way for branch_false to know it should be skipped without rerunning the
> branching task.
> This is obviously counter-intuitive from the user's perspective. In this
> example, users would not expect branch_false to execute when they clear it
> because the branching task should have skipped it.
> There are a few ways to improve this:
> Option 1): Make downstream tasks skipped by BranchPythonOperator not
> clearable without also clearing the upstream BranchPythonOperator. In this
> example, if someone clears branch_false without clearing branching, the Clear
> action should just fail with an error telling the user he needs to clear the
> branching task as well.
> Option 2): Make BranchPythonOperator store the result of its skip condition
> somewhere. Make downstream tasks check for this stored decision and skip
> themselves if they should have been skipped by the condition. This probably
> means the decision of BranchPythonOperator needs to be stored in the db.
>
> [kevcampb|https://blog.diffractive.io/author/kevcampb/] attempted a
> workaround and on this blog. And he acknowledged his workaround is not
> perfect and a better permanent fix is needed:
> [https://blog.diffractive.io/2018/08/07/replacement-shortcircuitoperator-for-airflow/]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)