Hi Elad, thank you for your feedback. To answer your question, besides debugging, another common use case is re-running existing DAGs in an ad-hoc manner.
For example, as an Airflow user, I sometimes want to trigger an ad-hoc DAG run. In this run, I want to skip one/more tasks, so the dag run can yield a different result, or simply complete sooner. As I mentioned in my previous email, there are other ways to achieve the same goal. But IMHO, neither of them are easy & flexible enough for an ad-hoc use case. Does that sound like a reasonable use case? What do you think is the best approach to solve it? I am happy to discuss more with you. On Sun, Jan 30, 2022 at 4:45 AM Elad Kalif <[email protected]> wrote: > Can you describe a use case for the requested feature other than > debugging? This doesn't feel like the right approach to test a specific > task in a pipeline. > > On Fri, Jan 28, 2022 at 11:44 PM Alex Begg <[email protected]> wrote: > >> Actually, sorry, you can scratch out some of what I just said, I thought >> you were talking about clearing states, you are instead referring to >> triggering a DAG run. That does kind of make sense to have a way to trigger >> a DAG run but only run specific tasks. >> >> On Fri, Jan 28, 2022 at 1:41 PM Alex Begg <[email protected]> wrote: >> >>> I believe this is currently possible by just unselecting “downstream” >>> before you click “Clear” in the UI. It should only clear the one middle >>> task and not the downstream task(s). >>> >>> I would prefer to not have a more detailed UI to allow to skip (or i >>> want to say “bypass” as “skip” is itself a task state) specific downstream >>> tasks as it might signal to users that it is ideal to specify tasks to >>> bypass when in reality it is only something that should be done on occasion >>> for experiment or troubleshooting as you mention, not a common occurrence. >>> >>> What I can agree to though is the list of buttons on the dialog window >>> to change state of a task is a bit cluttered looking. There probably can be >>> a better UI/UX for that, but I do think being able to check/uncheck >>> downstream task is a way to go, that seems like it will be just as >>> cluttered. >>> >>> Alex Begg >>> >>> On Fri, Jan 28, 2022 at 11:46 AM Hongyi Wang <[email protected]> wrote: >>> >>>> Hello everyone, >>>> >>>> I'd like to propose a new feature in Airflow -- allow users to specify >>>> tasks to skip when trigger DAG run. >>>> >>>> From our own experience, this feature can be very useful when doing >>>> experiments, troubleshooting or re-running existing DAGs. And I believe it >>>> can benefit many Airflow users. >>>> >>>> To illustrate the use case, I am going to use this example below. >>>> task-a ☐ -> task-b ☑ -> task-c ☐ >>>> >>>> Suppose we have a DAG containing 3 tasks. To troubleshoot "task-a" and >>>> "task-c", I want to trigger a manual DAG run and skip "task-b" (so I can >>>> save time & resource & focus on other two tasks). To do so, today I have >>>> two options: >>>> >>>> Option 1: Trigger DAG, then manually mark "task-b" as `SUCCESS` >>>> Option 2: Remove "task-b" from my DAG, then trigger DAG >>>> >>>> Neither of the options are great. Option 1 can be troublesome when DAG is >>>> large, and there are multiple tasks I want to skip. Option 2 requires >>>> change in the DAG file, which is not convenient for just troubleshooting. >>>> >>>> Therefore, I would love to discuss how we can provide an easy way for >>>> users to skip tasks when triggering DAG. >>>> >>>> Things to consider are: >>>> 1) We should allow user to specify all tasks to skip at once when trigger >>>> DAG >>>> 2) We should retain the dependencies between non-skip tasks (in above >>>> example, "task-c" won't start until "task-a" completes even if we skipped >>>> "task-b") >>>> 3) We should mark skipped task as `SKIPPED` instead of `SUCCESS` to make >>>> it more intuitive >>>> 4) The implementation should be easy, clean and low risk >>>> >>>> Here is my proposed solution (tested locally): >>>> Today, Airflow allow user to pass a JSON to the Dagrun as {{dag_run.conf}} >>>> when triggering DAG. The idea is, before queuing task instances that >>>> satisfies dependences, `scheduler_job.py` (after we make some change) will >>>> filter task instances to skip based on `dag_run.conf` user passes in (e.g. >>>> {"skip_tasks": ["task-b"]}), then mark them as SKIPPED. >>>> >>>> Things I would love to discuss: >>>> - What do you think about this feature? >>>> - What do you think about the proposed solution? >>>> - Did I miss anything that you want to discuss? >>>> - Is it necessary to introduce a new state (e.g. MANUAL_SKIPPED) to >>>> differentiate SKIPPED? >>>> >>>> Howie >>>> >>>>
