Hello Kaxil and thanks for the questions. Let me answer them... 1. We don't, but since we want to have one file that will be an example DAG and also a system test at the same time, we need to decide where to store them. We have 2 options: store them as tests somewhere in `tests` directory or store them as example DAGs (e.g. in providers directory). Since they fulfill their 'example' role using dynamic documentation that is generated from code (no matter where the files are stored), this purpose we have already covered. The 'test' role, on the other hand, is fulfilled by storing them in the right place, indicating that these are actually tests that can be executed. Also, we decided to move them because since we are planning to do the redesign, we can find a better place for them. `tests/system/providers` seems like a proper place for system tests. 2. I am not for or against setting up the DebugExecutor inside the environment variable, but I would like to see strong arguments against using it the way it is described in the AIP, so that it can possibly break the execution in some specific cases. Examples would be very desired. 3. You're right, if all tasks do not have any trigger rules, we don't need the watcher task. But since probably most of them will have teardown tasks (tasks with trigger_rule set to "all_done" to do the cleanup after test) we need to introduce this watcher task to handle those. We actually don't need to use the watcher task in DAGs without teardown tasks, but it's actually better to use it always because it doesn't provide anything other than to support the possibility of having a teardown task, so if anyone in the future decides to add some task to remove some resources, watcher task will be already there to handle it. It doesn't bring that much complexity to the code and computational consumption. Also, we are going to have a precommit hook that will check against absence of this piece of code and assure that it's there in every test. Of course, it's only my opinion and if we later decide to not use watcher task where it is not essentially needed, we can avoid adding it, but having it everywhere creates a kind of 'framework' for our tests and saves us from possible mistakes.
One of the concepts of the new design is to minimize the amount of environment variables to maintain. When I was fixing and refactoring system tests, the part that I found the most difficult was maintaining the list of environment variables. Even if every possible variable has the "good" default, it was not running successfully on the first run. I needed to check all the values of variables and see if they are sensible. They weren't. The problem is that they age very fast and with every change to the Airflow, they become brittle, outdated, useless. One of the main pillars of the new design of system tests is to minimize the number of environment variables needed to be maintained so that the DAGs just work without that much hassle required to configure them. I hope it delivers answers your concerns. Kind regards, Mateusz Nojek pon., 21 lut 2022 o 18:31 Kaxil Naik <[email protected]> napisał(a): > Apologies for chiming in late. I read the AIP now, please find my > questions/comments below: > > 1. Why do we need to move the DAGs from example_dags ? > 2. Hardcoding DebugExecutor using environment variables like that > doesn't feel right. Might cause issues when running with other executors. > 3. The Watcher task - why do we need that? If the example DAGs don't > set any Trigger rules, the default Trigger rule is "all_success" so the > DagRun will fail if any task fails. > > The only thing I think is needed to make the example DAGs actually run in > CI/locally is making sure the values comes from Environment Variables while > the DAG itself have sensible defaults. > > For instance, the following example should work fine if the environment > variables are set. > > AIRFLOW__CORE__EXECUTOR=DebugExecutor > python > airflow/providers/google/cloud/example_dags/example_gcs_timespan_file_transform.py > > What would be ideal (according to me) is to have airflow test run work > without execution_date which will take care of running it via > DebugExecutor. (docs > <https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#test> > ) > > This opens up the possibility for us in future to provide more useful > debug logs (like import time, number of DAGs, warn when a DAG doesn't > follow best practices etc) via the "airflow test" command. While this is > out of scope of the current AIP, I feel it might be a great value add. > > Just my 2-cents. > > Regards, > Kaxil > > > > On Mon, 21 Feb 2022 at 07:20, Mateusz Nojek <[email protected]> wrote: > >> Hello everyone, >> >> Since there was no response from the community during the weekend on this >> voting, I am extending it for the next 48 hours, so it will end on >> Wednesday 23rd February 9 AM CET. >> >> Current state is: >> 2/3 required binding votes, >> additional 4 non-binding votes. >> >> Kind regards, >> Mateusz Nojek >> >> >> niedz., 20 lut 2022 o 12:01 Mateusz Nojek <[email protected]> >> napisał(a): >> >>> Hi everyone, >>> >>> It's less than 24 hours till the end of the voting. >>> If anyone of you still hesitates, this is the time to share your opinion >>> with a vote on this AIP :) >>> Thank you! >>> >>> Kind regards, >>> Mateusz Nojek >>> >>> >>> pt., 18 lut 2022 o 21:36 Tomasz Urbaszek <[email protected]> >>> napisał(a): >>> >>>> +1 (binding) >>>> >>>> Tomek >>>> >>>
