uranusjr opened a new issue #18758:
URL: https://github.com/apache/airflow/issues/18758


   ### Description
   
   Currently (in versions up to 2.1.4), `airflow dags test <dag_id> 
<execution_date>` creates a *backfill* run at the specified datetime. This, 
however, applies regardless of whether the DAG can actually have a _logically_ 
automated backfill at that specific datetime or not. One example of this 
logically confusing behaviour is shown in #18473. A DAG with 
`schedule_interval=None` should logically have no backfill runs ever, but the 
`test` command would still happily create a backfill run at that datetime.
   
   With the introduction of custom timetables in AIP-39, the DAG scheduling 
logic went through some extensive refactoring to conform more closely to the 
DAG's schedule/timetable specification. This means that a backfill run can no 
longer be created at will. The 2.2 release will contain a hack to keep the 
current behaviour of "free" backfill run creation via `test` (#18742), but I 
would prefer this to be a temporary measure and be removed once we have a 
better solution.
   
   The root cause to this issue is, IMO, `airflow dags test` has very poor 
semantic as currently designed. It is entirely non-obvious it is creating 
*backfill* runs (and a subsequent `airflow dags backfill` call would therefore 
skip the specific datetime if and only if it lies on the logical schedule), nor 
why a backfill can happen without considering the schedule (it is the only way 
to do that in Airflow AFAIK). And the name `test` itself is somewhat a 
misnomer—why is creating a backfill run a test in the first place?
   
   ### Use case/motivation
   
   From what I can tell, the currently primary use case to `airflow dags test` 
is to check whether a DAG implements the tasks reasonably before it's 
activated. For this particular use case, the user does not actually care what 
kind of run is used, so a manual run would do. But we should also create a 
migration path for those relying on `airflow dags test` to create a backfill 
run, since the implied side effect of saving a backfill run later on is also 
somewhat useful.
   
   So the plan I currently have in mind is:
   
   * Add a new flag to `airflow dags trigger` to allow triggering a manual run 
and execute it directly in the console (instead of sending it to the 
scheduler). This will need some new mechanism since `trigger` is currently 
implemented by `DAG.create_dagrun()`. I think we'll need a new job class e.g. 
`ManualRunJob`.
   * Add a new flag to `airflow dags backfill` to do the same thing, but with a 
backfill run. This would cover the exact same use case as `airflow dags test` 
right now, but with more obvious semantics. The syntax would however be 
significantly more verbose, we need to work on that as well.
   * Deprecate `airflow dags test` since the its usage can be covered by the 
above two additions.
   
   ### Related issues
   
   Issue raised against 2.2 beta about the changed behaviour: #18473
   PR to "restore" the pre-2.2 behaviour: #18742
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to