Actually, sorry, you can scratch out some of what I just said, I thought you were talking about clearing states, you are instead referring to triggering a DAG run. That does kind of make sense to have a way to trigger a DAG run but only run specific tasks.
On Fri, Jan 28, 2022 at 1:41 PM Alex Begg <[email protected]> wrote: > I believe this is currently possible by just unselecting “downstream” > before you click “Clear” in the UI. It should only clear the one middle > task and not the downstream task(s). > > I would prefer to not have a more detailed UI to allow to skip (or i want > to say “bypass” as “skip” is itself a task state) specific downstream tasks > as it might signal to users that it is ideal to specify tasks to bypass > when in reality it is only something that should be done on occasion for > experiment or troubleshooting as you mention, not a common occurrence. > > What I can agree to though is the list of buttons on the dialog window to > change state of a task is a bit cluttered looking. There probably can be a > better UI/UX for that, but I do think being able to check/uncheck > downstream task is a way to go, that seems like it will be just as > cluttered. > > Alex Begg > > On Fri, Jan 28, 2022 at 11:46 AM Hongyi Wang <[email protected]> wrote: > >> Hello everyone, >> >> I'd like to propose a new feature in Airflow -- allow users to specify tasks >> to skip when trigger DAG run. >> >> From our own experience, this feature can be very useful when doing >> experiments, troubleshooting or re-running existing DAGs. And I believe it >> can benefit many Airflow users. >> >> To illustrate the use case, I am going to use this example below. >> task-a ☐ -> task-b ☑ -> task-c ☐ >> >> Suppose we have a DAG containing 3 tasks. To troubleshoot "task-a" and >> "task-c", I want to trigger a manual DAG run and skip "task-b" (so I can >> save time & resource & focus on other two tasks). To do so, today I have two >> options: >> >> Option 1: Trigger DAG, then manually mark "task-b" as `SUCCESS` >> Option 2: Remove "task-b" from my DAG, then trigger DAG >> >> Neither of the options are great. Option 1 can be troublesome when DAG is >> large, and there are multiple tasks I want to skip. Option 2 requires change >> in the DAG file, which is not convenient for just troubleshooting. >> >> Therefore, I would love to discuss how we can provide an easy way for users >> to skip tasks when triggering DAG. >> >> Things to consider are: >> 1) We should allow user to specify all tasks to skip at once when trigger DAG >> 2) We should retain the dependencies between non-skip tasks (in above >> example, "task-c" won't start until "task-a" completes even if we skipped >> "task-b") >> 3) We should mark skipped task as `SKIPPED` instead of `SUCCESS` to make it >> more intuitive >> 4) The implementation should be easy, clean and low risk >> >> Here is my proposed solution (tested locally): >> Today, Airflow allow user to pass a JSON to the Dagrun as {{dag_run.conf}} >> when triggering DAG. The idea is, before queuing task instances that >> satisfies dependences, `scheduler_job.py` (after we make some change) will >> filter task instances to skip based on `dag_run.conf` user passes in (e.g. >> {"skip_tasks": ["task-b"]}), then mark them as SKIPPED. >> >> Things I would love to discuss: >> - What do you think about this feature? >> - What do you think about the proposed solution? >> - Did I miss anything that you want to discuss? >> - Is it necessary to introduce a new state (e.g. MANUAL_SKIPPED) to >> differentiate SKIPPED? >> >> Howie >> >>
