Sounds good. While I don't think we should enforce the "watcher" task for
all the example DAGs, it isn't a blocker.

+1 (binding)

On Tue, 22 Feb 2022 at 07:43, Mateusz Nojek <[email protected]> wrote:

> Hello Kaxil and thanks for the questions.
> Let me answer them...
>
>    1. We don't, but since we want to have one file that will be an
>    example DAG and also a system test at the same time, we need to decide
>    where to store them. We have 2 options: store them as tests somewhere in
>    `tests` directory or store them as example DAGs (e.g. in providers
>    directory). Since they fulfill their 'example' role using dynamic
>    documentation that is generated from code (no matter where the files are
>    stored), this purpose we have already covered. The 'test' role, on the
>    other hand, is fulfilled by storing them in the right place, indicating
>    that these are actually tests that can be executed. Also, we decided to
>    move them because since we are planning to do the redesign, we can find a
>    better place for them. `tests/system/providers` seems like a proper place
>    for system tests.
>    2. I am not for or against setting up the DebugExecutor inside the
>    environment variable, but I would like to see strong arguments against
>    using it the way it is described in the AIP, so that it can possibly break
>    the execution in some specific cases. Examples would be very desired.
>    3. You're right, if all tasks do not have any trigger rules, we don't
>    need the watcher task. But since probably most of them will have teardown
>    tasks (tasks with trigger_rule set to "all_done" to do the cleanup after
>    test) we need to introduce this watcher task to handle those. We actually
>    don't need to use the watcher task in DAGs without teardown tasks, but it's
>    actually better to use it always because it doesn't provide anything other
>    than to support the possibility of having a teardown task, so if anyone in
>    the future decides to add some task to remove some resources, watcher task
>    will be already there to handle it. It doesn't bring that much complexity
>    to the code and computational consumption. Also, we are going to have a
>    precommit hook that will check against absence of this piece of code and
>    assure that it's there in every test. Of course, it's only my opinion and
>    if we later decide to not use watcher task where it is not essentially
>    needed, we can avoid adding it, but having it everywhere creates a kind of
>    'framework' for our tests and saves us from possible mistakes.
>
> One of the concepts of the new design is to minimize the amount of
> environment variables to maintain. When I was fixing and refactoring system
> tests, the part that I found the most difficult was maintaining the list of
> environment variables. Even if every possible variable has the "good"
> default, it was not running successfully on the first run. I needed to
> check all the values of variables and see if they are sensible. They
> weren't. The problem is that they age very fast and with every change to
> the Airflow, they become brittle, outdated, useless. One of the main
> pillars of the new design of system tests is to minimize the number of
> environment variables needed to be maintained so that the DAGs just work
> without that much hassle required to configure them.
>
> I hope it delivers answers your concerns.
>
> Kind regards,
> Mateusz Nojek
>
>
> pon., 21 lut 2022 o 18:31 Kaxil Naik <[email protected]> napisał(a):
>
>> Apologies for chiming in late. I read the AIP now, please find my
>> questions/comments below:
>>
>>    1. Why do we need to move the DAGs from example_dags ?
>>    2. Hardcoding DebugExecutor using environment variables like that
>>    doesn't feel right. Might cause issues when running with other executors.
>>    3. The Watcher task - why do we need that? If the example DAGs don't
>>    set any Trigger rules, the default Trigger rule is "all_success" so the
>>    DagRun will fail if any task fails.
>>
>> The only thing I think is needed to make the example DAGs actually run in
>> CI/locally is making sure the values comes from Environment Variables while
>> the DAG itself have sensible defaults.
>>
>> For instance, the following example should work fine if the environment
>> variables are set.
>>
>> AIRFLOW__CORE__EXECUTOR=DebugExecutor
>> python 
>> airflow/providers/google/cloud/example_dags/example_gcs_timespan_file_transform.py
>>
>> What would be ideal (according to me) is to have airflow test run work
>> without execution_date which will take care of running it via
>> DebugExecutor. (docs
>> <https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#test>
>> )
>>
>> This opens up the possibility for us in future to provide more useful
>> debug logs (like import time, number of DAGs, warn when a DAG doesn't
>> follow best practices etc) via the "airflow test" command. While this is
>> out of scope of the current AIP, I feel it might be a great value add.
>>
>> Just my 2-cents.
>>
>> Regards,
>> Kaxil
>>
>>
>>
>> On Mon, 21 Feb 2022 at 07:20, Mateusz Nojek <[email protected]> wrote:
>>
>>> Hello everyone,
>>>
>>> Since there was no response from the community during the weekend on
>>> this voting, I am extending it for the next 48 hours, so it will end on
>>> Wednesday 23rd February 9 AM CET.
>>>
>>> Current state is:
>>> 2/3 required binding votes,
>>> additional 4 non-binding votes.
>>>
>>> Kind regards,
>>> Mateusz Nojek
>>>
>>>
>>> niedz., 20 lut 2022 o 12:01 Mateusz Nojek <[email protected]>
>>> napisał(a):
>>>
>>>> Hi everyone,
>>>>
>>>> It's less than 24 hours till the end of the voting.
>>>> If anyone of you still hesitates, this is the time to share your
>>>> opinion with a vote on this AIP :)
>>>> Thank you!
>>>>
>>>> Kind regards,
>>>> Mateusz Nojek
>>>>
>>>>
>>>> pt., 18 lut 2022 o 21:36 Tomasz Urbaszek <[email protected]>
>>>> napisał(a):
>>>>
>>>>> +1 (binding)
>>>>>
>>>>> Tomek
>>>>>
>>>>

Reply via email to