Re: [VOTE] AIP-47 New design of Airflow System Tests

Jarek Potiuk Thu, 24 Feb 2022 05:48:26 -0800

One comment to Kaxil. Fully agree.

I think that was a left-over from the previous attempt to define watcher.
What we wanted to make sure is that all tasks are included when watcher
dependency is added. This was mostly when we wanted to use:
[task1, task2, task3] >> watcher


This was really because we foresaw that someone can add new tasks, we do
not forget to add it ito the list.
But with `dag.list(tasks) >> watcher()`, it's not really needed. Our check
for watcher will be much simpler:

If watcher task (name convention) is added to example_dags, make sure that
the last significant  line of the example dag is `dag.list(tasks) >>
watcher`

That's about it. We do not need anything more and certainly there will be a
number of example_dags that will have no watcher at all.

J.


czw., 24 lut 2022, 08:23 użytkownik Mateusz Nojek <[email protected]>
napisał:

> Hello again :)
>
> The vote has passed with 3 binding +1 votes (and 4 non-binding +1 votes):
>
> Jarek Potiuk +1 (binding)
> Tomasz Urbaszek +1 (binding)
> Kaxil Naik +1 (binding)
> Bartłomiej Hirsz +1 (non-binding)
> Eugen Kosteev +1 (non-binding)
> Mars Group +1 (non-binding)
> Łukasz Wyszomirski +1 (non-binding)
>
> Thanks everyone!
>
> We will proceed with the implementation.
>
> Kind regards,
> Mateusz Nojek
>
>
> śr., 23 lut 2022 o 21:04 Kaxil Naik <[email protected]> napisał(a):
>
>> Sounds good. While I don't think we should enforce the "watcher" task for
>> all the example DAGs, it isn't a blocker.
>>
>> +1 (binding)
>>
>> On Tue, 22 Feb 2022 at 07:43, Mateusz Nojek <[email protected]> wrote:
>>
>>> Hello Kaxil and thanks for the questions.
>>> Let me answer them...
>>>
>>>    1. We don't, but since we want to have one file that will be an
>>>    example DAG and also a system test at the same time, we need to decide
>>>    where to store them. We have 2 options: store them as tests somewhere in
>>>    `tests` directory or store them as example DAGs (e.g. in providers
>>>    directory). Since they fulfill their 'example' role using dynamic
>>>    documentation that is generated from code (no matter where the files are
>>>    stored), this purpose we have already covered. The 'test' role, on the
>>>    other hand, is fulfilled by storing them in the right place, indicating
>>>    that these are actually tests that can be executed. Also, we decided to
>>>    move them because since we are planning to do the redesign, we can find a
>>>    better place for them. `tests/system/providers` seems like a proper place
>>>    for system tests.
>>>    2. I am not for or against setting up the DebugExecutor inside the
>>>    environment variable, but I would like to see strong arguments against
>>>    using it the way it is described in the AIP, so that it can possibly 
>>> break
>>>    the execution in some specific cases. Examples would be very desired.
>>>    3. You're right, if all tasks do not have any trigger rules, we
>>>    don't need the watcher task. But since probably most of them will have
>>>    teardown tasks (tasks with trigger_rule set to "all_done" to do the 
>>> cleanup
>>>    after test) we need to introduce this watcher task to handle those. We
>>>    actually don't need to use the watcher task in DAGs without teardown 
>>> tasks,
>>>    but it's actually better to use it always because it doesn't provide
>>>    anything other than to support the possibility of having a teardown task,
>>>    so if anyone in the future decides to add some task to remove some
>>>    resources, watcher task will be already there to handle it. It doesn't
>>>    bring that much complexity to the code and computational consumption. 
>>> Also,
>>>    we are going to have a precommit hook that will check against absence of
>>>    this piece of code and assure that it's there in every test. Of course,
>>>    it's only my opinion and if we later decide to not use watcher task where
>>>    it is not essentially needed, we can avoid adding it, but having it
>>>    everywhere creates a kind of 'framework' for our tests and saves us from
>>>    possible mistakes.
>>>
>>> One of the concepts of the new design is to minimize the amount of
>>> environment variables to maintain. When I was fixing and refactoring system
>>> tests, the part that I found the most difficult was maintaining the list of
>>> environment variables. Even if every possible variable has the "good"
>>> default, it was not running successfully on the first run. I needed to
>>> check all the values of variables and see if they are sensible. They
>>> weren't. The problem is that they age very fast and with every change to
>>> the Airflow, they become brittle, outdated, useless. One of the main
>>> pillars of the new design of system tests is to minimize the number of
>>> environment variables needed to be maintained so that the DAGs just work
>>> without that much hassle required to configure them.
>>>
>>> I hope it delivers answers your concerns.
>>>
>>> Kind regards,
>>> Mateusz Nojek
>>>
>>>
>>> pon., 21 lut 2022 o 18:31 Kaxil Naik <[email protected]> napisał(a):
>>>
>>>> Apologies for chiming in late. I read the AIP now, please find my
>>>> questions/comments below:
>>>>
>>>>    1. Why do we need to move the DAGs from example_dags ?
>>>>    2. Hardcoding DebugExecutor using environment variables like that
>>>>    doesn't feel right. Might cause issues when running with other 
>>>> executors.
>>>>    3. The Watcher task - why do we need that? If the example DAGs
>>>>    don't set any Trigger rules, the default Trigger rule is "all_success" 
>>>> so
>>>>    the DagRun will fail if any task fails.
>>>>
>>>> The only thing I think is needed to make the example DAGs actually run
>>>> in CI/locally is making sure the values comes from Environment Variables
>>>> while the DAG itself have sensible defaults.
>>>>
>>>> For instance, the following example should work fine if the environment
>>>> variables are set.
>>>>
>>>> AIRFLOW__CORE__EXECUTOR=DebugExecutor
>>>> python 
>>>> airflow/providers/google/cloud/example_dags/example_gcs_timespan_file_transform.py
>>>>
>>>> What would be ideal (according to me) is to have airflow test run work
>>>> without execution_date which will take care of running it via
>>>> DebugExecutor. (docs
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#test>
>>>> )
>>>>
>>>> This opens up the possibility for us in future to provide more useful
>>>> debug logs (like import time, number of DAGs, warn when a DAG doesn't
>>>> follow best practices etc) via the "airflow test" command. While this is
>>>> out of scope of the current AIP, I feel it might be a great value add.
>>>>
>>>> Just my 2-cents.
>>>>
>>>> Regards,
>>>> Kaxil
>>>>
>>>>
>>>>
>>>> On Mon, 21 Feb 2022 at 07:20, Mateusz Nojek <[email protected]> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> Since there was no response from the community during the weekend on
>>>>> this voting, I am extending it for the next 48 hours, so it will end on
>>>>> Wednesday 23rd February 9 AM CET.
>>>>>
>>>>> Current state is:
>>>>> 2/3 required binding votes,
>>>>> additional 4 non-binding votes.
>>>>>
>>>>> Kind regards,
>>>>> Mateusz Nojek
>>>>>
>>>>>
>>>>> niedz., 20 lut 2022 o 12:01 Mateusz Nojek <[email protected]>
>>>>> napisał(a):
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> It's less than 24 hours till the end of the voting.
>>>>>> If anyone of you still hesitates, this is the time to share your
>>>>>> opinion with a vote on this AIP :)
>>>>>> Thank you!
>>>>>>
>>>>>> Kind regards,
>>>>>> Mateusz Nojek
>>>>>>
>>>>>>
>>>>>> pt., 18 lut 2022 o 21:36 Tomasz Urbaszek <[email protected]>
>>>>>> napisał(a):
>>>>>>
>>>>>>> +1 (binding)
>>>>>>>
>>>>>>> Tomek
>>>>>>>
>>>>>>

Re: [VOTE] AIP-47 New design of Airflow System Tests

Reply via email to