dstandish edited a comment on pull request #13421:
URL: https://github.com/apache/airflow/pull/13421#issuecomment-753694100
hehe no worries :)
i think the example_bash_operator does a good job of illustrating why
example dags should not be used in tests.
in general, tests should have only the elements that are absolutely
necessary to test the behavior they are examining.
let me give an example.
consider this class:
```python
class MyObj:
def __init__(self, param1=None, param2=None):
self.param1 = param1
self.param2 = param2
def get_param1(self):
return self.param1
```
consider these tests
```python
def test_get_param1():
m = MyObj('a')
assert m.get_param1() == 'a'
```
```python
def test_get_param1():
m = MyObj('a', 'b')
assert m.get_param1() == 'a'
```
The first test is better because the value for `param2` is totally
irrelevant to the test. Including 'b' is just extra noise, and it is in fact a
costly distraction because if there is something wrong with the test, it's
another thing we have to chase down to see if it is the cause. And if we need
to make a change, then it's another thing we need to understand.
Similarly, when we use example_bash_operator as a test dag for many
different tests, inevitably we end up with a lot of extra noise in tests. An
example of this is in test `test_backfill_multi_dates` in
`test_backfill_job.py`. It uses example bash operator.
There is a component in there that spells out expected execution order:
```python
expected_execution_order = [
("runme_0", DEFAULT_DATE),
("runme_1", DEFAULT_DATE),
("runme_2", DEFAULT_DATE),
("runme_0", end_date),
("runme_1", end_date),
("runme_2", end_date),
("also_run_this", DEFAULT_DATE),
("also_run_this", end_date),
("run_after_loop", DEFAULT_DATE),
("run_after_loop", end_date),
("run_this_last", DEFAULT_DATE),
("run_this_last", end_date),
]
```
I'd bet this test could be accomplished with fewer tasks. But it uses all
of them because that's what is in example_bash_operator. And probably as
people have added to that example, they add to this and other tests test. But
by doing so, we are spending effort, and not getting any return on our effort.
Extraneous content in a test makes it tougher to understand exactly what it is
testing. So adding more tasks to this test makes this test _worse_, not
better. It increases technical debt. And that is not good for the community.
If I add another task to this test, merely because I want to demonstrate new
feature in an example dag, then I make this test unnecessarily worse.
Keeping test and example combined is not good for users either. In
handcuffs us with regard to writing example dags because we are constrained by
the tests that reference it, tests which we don't want to modify without good
reason.
So in sum there are two main problems with this approach:
1. test and example dags are the same
* constrained from designing example dags to best exemplify features
because they have mixed purposes
* can't freely alter an example without dealing with entirely unrelated
tests
2. having too many tests share the same test dag
* makes tests unnecessarily complicated and noisy
* can reach a point where the structure required by one test is
incompatible, or at least very inconvenient, when trying to meet the needs of
another test
* developer inconvenience -- need to understand, and alter, many
different tests, entirely unrelated to your change
- non-trivial risk that "fixing" a test actually breaks it, when did
not really need to change it at all for your PR
So, I do believe there are costs incurred in (1) mixing test and example
dags and (2) having many tests share the same test dag. And my question is,
what is the benefit? If there is a return, then great. But I don't see it.
What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]