[ 
https://issues.apache.org/jira/browse/AIRFLOW-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965639#comment-16965639
 ] 

Qian Yu commented on AIRFLOW-2279:
----------------------------------

I'm in the process of working on something I call ClearTaskOperator. In fact, I 
already have a working PR. I was using it for a different purpose (rerunning 
completed tasks). However, the exact same operator can be used to achieve the 
intention of [~asoni-stripe] .

[https://github.com/apache/airflow/pull/6392]

 

For example, if we want to make sure when task_A on dag1 is cleared, it always 
clears a task_B on a dag2, we can do make the dag1 look like this:
{code:python}
task_A >> clear_Task_B
{code}
 

where
{code:python}
clear_Task_B = ClearTaskOperator(external_task_id="task_B", 
external_dag_id="dag2")
{code}
So the change is actually very simple, just adding a new operator, without even 
touching the core Airflow code.

I saw [~gsilk] has an interesting requirement that I did not consider when 
coming up with the PR, that is to limit the number tasks cleared at the same 
time. But adding this is trivial because it is just an additional check in the 
execute() function.

And some other improvements I can think of for the PR is to make it behave like 
an ExternalTaskSensor, i.e. it only clears the target tasks if they are done. 
If the target tasks are still running, it'll just wait and reschedule itself 
for a later time to try clearing them again.

If there are interest in this new operator or these additional features, pls 
mention that on my PR and I'll continue to develop it to make it better and 
hopefully get more support from upstream.

> Clearing Tasks Across DAGs
> --------------------------
>
>                 Key: AIRFLOW-2279
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2279
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Achal Soni
>            Priority: Major
>         Attachments: cross_dag_ui_screenshot.png
>
>
> At Stripe, we commonly have discrete dags that depend on each other by 
> leveraging ExternalTaskSensors. We also find ourselves routinely wanting to 
> not only clear tasks and their downstream tasks in a particular dag, but also 
> their downstream tasks in their dependent dags (linked by 
> ExternalTaskSensors). 
> We currently have extended Airflow to handle this by modifying the webapp and 
> cli tool to optionally clear dependent tasks across multiple dags (see 
> attached screenshot). 
> We want to open the floor for discussion with the larger Airflow community 
> about the usage of ExternalTaskSensors and specifically how to handle 
> clearing across dags. We are interested in learning more about the accepted 
> practices in this regard, and are very open/willing to contribute in this 
> area if there is interest!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to