Sounds good. While I don't think we should enforce the "watcher" task for all the example DAGs, it isn't a blocker.
+1 (binding) On Tue, 22 Feb 2022 at 07:43, Mateusz Nojek <[email protected]> wrote: > Hello Kaxil and thanks for the questions. > Let me answer them... > > 1. We don't, but since we want to have one file that will be an > example DAG and also a system test at the same time, we need to decide > where to store them. We have 2 options: store them as tests somewhere in > `tests` directory or store them as example DAGs (e.g. in providers > directory). Since they fulfill their 'example' role using dynamic > documentation that is generated from code (no matter where the files are > stored), this purpose we have already covered. The 'test' role, on the > other hand, is fulfilled by storing them in the right place, indicating > that these are actually tests that can be executed. Also, we decided to > move them because since we are planning to do the redesign, we can find a > better place for them. `tests/system/providers` seems like a proper place > for system tests. > 2. I am not for or against setting up the DebugExecutor inside the > environment variable, but I would like to see strong arguments against > using it the way it is described in the AIP, so that it can possibly break > the execution in some specific cases. Examples would be very desired. > 3. You're right, if all tasks do not have any trigger rules, we don't > need the watcher task. But since probably most of them will have teardown > tasks (tasks with trigger_rule set to "all_done" to do the cleanup after > test) we need to introduce this watcher task to handle those. We actually > don't need to use the watcher task in DAGs without teardown tasks, but it's > actually better to use it always because it doesn't provide anything other > than to support the possibility of having a teardown task, so if anyone in > the future decides to add some task to remove some resources, watcher task > will be already there to handle it. It doesn't bring that much complexity > to the code and computational consumption. Also, we are going to have a > precommit hook that will check against absence of this piece of code and > assure that it's there in every test. Of course, it's only my opinion and > if we later decide to not use watcher task where it is not essentially > needed, we can avoid adding it, but having it everywhere creates a kind of > 'framework' for our tests and saves us from possible mistakes. > > One of the concepts of the new design is to minimize the amount of > environment variables to maintain. When I was fixing and refactoring system > tests, the part that I found the most difficult was maintaining the list of > environment variables. Even if every possible variable has the "good" > default, it was not running successfully on the first run. I needed to > check all the values of variables and see if they are sensible. They > weren't. The problem is that they age very fast and with every change to > the Airflow, they become brittle, outdated, useless. One of the main > pillars of the new design of system tests is to minimize the number of > environment variables needed to be maintained so that the DAGs just work > without that much hassle required to configure them. > > I hope it delivers answers your concerns. > > Kind regards, > Mateusz Nojek > > > pon., 21 lut 2022 o 18:31 Kaxil Naik <[email protected]> napisał(a): > >> Apologies for chiming in late. I read the AIP now, please find my >> questions/comments below: >> >> 1. Why do we need to move the DAGs from example_dags ? >> 2. Hardcoding DebugExecutor using environment variables like that >> doesn't feel right. Might cause issues when running with other executors. >> 3. The Watcher task - why do we need that? If the example DAGs don't >> set any Trigger rules, the default Trigger rule is "all_success" so the >> DagRun will fail if any task fails. >> >> The only thing I think is needed to make the example DAGs actually run in >> CI/locally is making sure the values comes from Environment Variables while >> the DAG itself have sensible defaults. >> >> For instance, the following example should work fine if the environment >> variables are set. >> >> AIRFLOW__CORE__EXECUTOR=DebugExecutor >> python >> airflow/providers/google/cloud/example_dags/example_gcs_timespan_file_transform.py >> >> What would be ideal (according to me) is to have airflow test run work >> without execution_date which will take care of running it via >> DebugExecutor. (docs >> <https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#test> >> ) >> >> This opens up the possibility for us in future to provide more useful >> debug logs (like import time, number of DAGs, warn when a DAG doesn't >> follow best practices etc) via the "airflow test" command. While this is >> out of scope of the current AIP, I feel it might be a great value add. >> >> Just my 2-cents. >> >> Regards, >> Kaxil >> >> >> >> On Mon, 21 Feb 2022 at 07:20, Mateusz Nojek <[email protected]> wrote: >> >>> Hello everyone, >>> >>> Since there was no response from the community during the weekend on >>> this voting, I am extending it for the next 48 hours, so it will end on >>> Wednesday 23rd February 9 AM CET. >>> >>> Current state is: >>> 2/3 required binding votes, >>> additional 4 non-binding votes. >>> >>> Kind regards, >>> Mateusz Nojek >>> >>> >>> niedz., 20 lut 2022 o 12:01 Mateusz Nojek <[email protected]> >>> napisał(a): >>> >>>> Hi everyone, >>>> >>>> It's less than 24 hours till the end of the voting. >>>> If anyone of you still hesitates, this is the time to share your >>>> opinion with a vote on this AIP :) >>>> Thank you! >>>> >>>> Kind regards, >>>> Mateusz Nojek >>>> >>>> >>>> pt., 18 lut 2022 o 21:36 Tomasz Urbaszek <[email protected]> >>>> napisał(a): >>>> >>>>> +1 (binding) >>>>> >>>>> Tomek >>>>> >>>>
