> To illustrate the use case, I am going to use this example below.
> task-a ☐ -> task-b ☑ -> task-c ☐
So in this example, are you aware and intending that both task-a and task-c run straight away? Because by skipping task-b, task-c's dependencies would be resolved and it would be eligible to run.

From what you've describe I don't think that is actually what you want, but that you want it to behave as if the dag was specified as task-a -> task-c, right?

Honestly though: I'm not sold that this should belong in Airflow scheduler -- re-running DagRuns on an ad-hoc basis is more aligned with `airflow backfill`.

-ash

On Sun, Jan 30 2022 at 11:21:26 -0800, Hongyi Wang <[email protected]> wrote:
Hi Elad, thank you for your feedback. To answer your question, besides debugging, another common use case is re-running existing DAGs in an ad-hoc manner.

For example, as an Airflow user, I sometimes want to trigger an ad-hoc DAG run. In this run, I want to skip one/more tasks, so the dag run can yield a different result, or simply complete sooner. As I mentioned in my previous email, there are other ways to achieve the same goal. But IMHO, neither of them are easy & flexible enough for an ad-hoc use case.

Does that sound like a reasonable use case? What do you think is the best approach to solve it? I am happy to discuss more with you.

On Sun, Jan 30, 2022 at 4:45 AM Elad Kalif <[email protected] <mailto:[email protected]>> wrote:
Can you describe a use case for the requested feature other than debugging? This doesn't feel like the right approach to test a specific task in a pipeline.

On Fri, Jan 28, 2022 at 11:44 PM Alex Begg <[email protected] <mailto:[email protected]>> wrote:
Actually, sorry, you can scratch out some of what I just said, I thought you were talking about clearing states, you are instead referring to triggering a DAG run. That does kind of make sense to have a way to trigger a DAG run but only run specific tasks.

On Fri, Jan 28, 2022 at 1:41 PM Alex Begg <[email protected] <mailto:[email protected]>> wrote:
I believe this is currently possible by just unselecting “downstream” before you click “Clear” in the UI. It should only clear the one middle task and not the downstream task(s).

I would prefer to not have a more detailed UI to allow to skip (or i want to say “bypass” as “skip” is itself a task state) specific downstream tasks as it might signal to users that it is ideal to specify tasks to bypass when in reality it is only something that should be done on occasion for experiment or troubleshooting as you mention, not a common occurrence.

What I can agree to though is the list of buttons on the dialog window to change state of a task is a bit cluttered looking. There probably can be a better UI/UX for that, but I do think being able to check/uncheck downstream task is a way to go, that seems like it will be just as cluttered.

Alex Begg

On Fri, Jan 28, 2022 at 11:46 AM Hongyi Wang <[email protected] <mailto:[email protected]>> wrote:
Hello everyone,

I'd like to propose a new feature in Airflow -- allow users to specify tasks to skip when trigger DAG run. From our own experience, this feature can be very useful when doing experiments, troubleshooting or re-running existing DAGs. And I believe it can benefit many Airflow users.
To illustrate the use case, I am going to use this example below.
task-a ☐ -> task-b ☑ -> task-c ☐

Suppose we have a DAG containing 3 tasks. To troubleshoot "task-a" and "task-c", I want to trigger a manual DAG run and skip "task-b" (so I can save time & resource & focus on other two tasks). To do so, today I have two options:

Option 1: Trigger DAG, then manually mark "task-b" as `SUCCESS`
Option 2: Remove "task-b" from my DAG, then trigger DAG

Neither of the options are great. Option 1 can be troublesome when DAG is large, and there are multiple tasks I want to skip. Option 2 requires change in the DAG file, which is not convenient for just troubleshooting.

Therefore, I would love to discuss how we can provide an easy way for users to skip tasks when triggering DAG.

Things to consider are:
1) We should allow user to specify all tasks to skip at once when trigger DAG 2) We should retain the dependencies between non-skip tasks (in above example, "task-c" won't start until "task-a" completes even if we skipped "task-b") 3) We should mark skipped task as `SKIPPED` instead of `SUCCESS` to make it more intuitive
4) The implementation should be easy, clean and low risk

Here is my proposed solution (tested locally):
Today, Airflow allow user to pass a JSON to the Dagrun as {{dag_run.conf}} when triggering DAG. The idea is, before queuing task instances that satisfies dependences, `scheduler_job.py` (after we make some change) will filter task instances to skip based on `dag_run.conf` user passes in (e.g. {"skip_tasks": ["task-b"]}), then mark them as SKIPPED.

Things I would love to discuss:
- What do you think about this feature?
- What do you think about the proposed solution?
- Did I miss anything that you want to discuss?
- Is it necessary to introduce a new state (e.g. MANUAL_SKIPPED) to differentiate SKIPPED?
Howie

Reply via email to