Hello Kaxil and thanks for the questions.
Let me answer them...

   1. We don't, but since we want to have one file that will be an example
   DAG and also a system test at the same time, we need to decide where to
   store them. We have 2 options: store them as tests somewhere in `tests`
   directory or store them as example DAGs (e.g. in providers directory).
   Since they fulfill their 'example' role using dynamic documentation that is
   generated from code (no matter where the files are stored), this purpose we
   have already covered. The 'test' role, on the other hand, is fulfilled by
   storing them in the right place, indicating that these are actually tests
   that can be executed. Also, we decided to move them because since we are
   planning to do the redesign, we can find a better place for them.
   `tests/system/providers` seems like a proper place for system tests.
   2. I am not for or against setting up the DebugExecutor inside the
   environment variable, but I would like to see strong arguments against
   using it the way it is described in the AIP, so that it can possibly break
   the execution in some specific cases. Examples would be very desired.
   3. You're right, if all tasks do not have any trigger rules, we don't
   need the watcher task. But since probably most of them will have teardown
   tasks (tasks with trigger_rule set to "all_done" to do the cleanup after
   test) we need to introduce this watcher task to handle those. We actually
   don't need to use the watcher task in DAGs without teardown tasks, but it's
   actually better to use it always because it doesn't provide anything other
   than to support the possibility of having a teardown task, so if anyone in
   the future decides to add some task to remove some resources, watcher task
   will be already there to handle it. It doesn't bring that much complexity
   to the code and computational consumption. Also, we are going to have a
   precommit hook that will check against absence of this piece of code and
   assure that it's there in every test. Of course, it's only my opinion and
   if we later decide to not use watcher task where it is not essentially
   needed, we can avoid adding it, but having it everywhere creates a kind of
   'framework' for our tests and saves us from possible mistakes.

One of the concepts of the new design is to minimize the amount of
environment variables to maintain. When I was fixing and refactoring system
tests, the part that I found the most difficult was maintaining the list of
environment variables. Even if every possible variable has the "good"
default, it was not running successfully on the first run. I needed to
check all the values of variables and see if they are sensible. They
weren't. The problem is that they age very fast and with every change to
the Airflow, they become brittle, outdated, useless. One of the main
pillars of the new design of system tests is to minimize the number of
environment variables needed to be maintained so that the DAGs just work
without that much hassle required to configure them.

I hope it delivers answers your concerns.

Kind regards,
Mateusz Nojek


pon., 21 lut 2022 o 18:31 Kaxil Naik <[email protected]> napisał(a):

> Apologies for chiming in late. I read the AIP now, please find my
> questions/comments below:
>
>    1. Why do we need to move the DAGs from example_dags ?
>    2. Hardcoding DebugExecutor using environment variables like that
>    doesn't feel right. Might cause issues when running with other executors.
>    3. The Watcher task - why do we need that? If the example DAGs don't
>    set any Trigger rules, the default Trigger rule is "all_success" so the
>    DagRun will fail if any task fails.
>
> The only thing I think is needed to make the example DAGs actually run in
> CI/locally is making sure the values comes from Environment Variables while
> the DAG itself have sensible defaults.
>
> For instance, the following example should work fine if the environment
> variables are set.
>
> AIRFLOW__CORE__EXECUTOR=DebugExecutor
> python 
> airflow/providers/google/cloud/example_dags/example_gcs_timespan_file_transform.py
>
> What would be ideal (according to me) is to have airflow test run work
> without execution_date which will take care of running it via
> DebugExecutor. (docs
> <https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#test>
> )
>
> This opens up the possibility for us in future to provide more useful
> debug logs (like import time, number of DAGs, warn when a DAG doesn't
> follow best practices etc) via the "airflow test" command. While this is
> out of scope of the current AIP, I feel it might be a great value add.
>
> Just my 2-cents.
>
> Regards,
> Kaxil
>
>
>
> On Mon, 21 Feb 2022 at 07:20, Mateusz Nojek <[email protected]> wrote:
>
>> Hello everyone,
>>
>> Since there was no response from the community during the weekend on this
>> voting, I am extending it for the next 48 hours, so it will end on
>> Wednesday 23rd February 9 AM CET.
>>
>> Current state is:
>> 2/3 required binding votes,
>> additional 4 non-binding votes.
>>
>> Kind regards,
>> Mateusz Nojek
>>
>>
>> niedz., 20 lut 2022 o 12:01 Mateusz Nojek <[email protected]>
>> napisał(a):
>>
>>> Hi everyone,
>>>
>>> It's less than 24 hours till the end of the voting.
>>> If anyone of you still hesitates, this is the time to share your opinion
>>> with a vote on this AIP :)
>>> Thank you!
>>>
>>> Kind regards,
>>> Mateusz Nojek
>>>
>>>
>>> pt., 18 lut 2022 o 21:36 Tomasz Urbaszek <[email protected]>
>>> napisał(a):
>>>
>>>> +1 (binding)
>>>>
>>>> Tomek
>>>>
>>>

Reply via email to