dstandish edited a comment on pull request #13421:
URL: https://github.com/apache/airflow/pull/13421#issuecomment-753694100


   hehe no worries :)
   
   i think the example_bash_operator does a good job of illustrating why 
example dags should not be used in tests.
   
   in general, tests should have only the elements that are absolutely 
necessary to test the behavior they are examining.
   
   let me give an example.
   
   consider this class:
   ```python
   class MyObj:
       def __init__(self, param1=None, param2=None):
           self.param1 = param1
           self.param2 = param2
           
       def get_param1(self):
           return self.param1
   ```
   
   consider these tests
   
   ```python
   def test_get_param1():
       m = MyObj('a')
       assert m.get_param1() == 'a'
   ```
   
   ```python
   def test_get_param1():
       m = MyObj('a', 'b')
       assert m.get_param1() == 'a'
   ```
   
   The first test is better because the value for `param2` is totally 
irrelevant to the test.  Including 'b' is just extra noise, and it is in fact a 
costly distraction because if there is something wrong with the test, it's 
another thing we have to chase down to see if it is the cause.  And if we need 
to make a change, then it's another thing we need to understand.
   
   Similarly, when we use example_bash_operator as a test dag for many 
different tests, inevitably we end up with a lot of extra noise in tests.  An 
example of this is in test `test_backfill_multi_dates` in 
`test_backfill_job.py`.  It uses example bash operator.
   
   There is a component in there that spells out expected execution order:
   ```python
           expected_execution_order = [
               ("runme_0", DEFAULT_DATE),
               ("runme_1", DEFAULT_DATE),
               ("runme_2", DEFAULT_DATE),
               ("runme_0", end_date),
               ("runme_1", end_date),
               ("runme_2", end_date),
               ("also_run_this", DEFAULT_DATE),
               ("also_run_this", end_date),
               ("run_after_loop", DEFAULT_DATE),
               ("run_after_loop", end_date),
               ("run_this_last", DEFAULT_DATE),
               ("run_this_last", end_date),
           ]
   ```
   
   Almost certainly this test could be accomplished with far fewer tasks.  But 
it uses all of them because that's what is in example_bash_operator.  And 
probably as people have added to that example, they add to this and other tests 
test.  But by doing so, we are spending effort, and not getting any return on 
our effort.  Extraneous content in a test makes it tougher to understand 
exactly what it is testing.  So adding more tasks to this test makes this test 
_worse_, not better.  It increases technical debt.  And that is not good for 
the community.
   
   Keeping test and example combined is not good for users either.   In 
handcuffs us with regard to writing example dags because we are constrained by 
the tests that reference it, tests which we don't want to modify without good 
reason.
   
   So in sum there are two main problems with this approach:
   
   1. test and example dags are the same
       * constrained from designing example dags to best exemplify features 
because they have mixed purposes
       * can't freely alter an example without dealing with entirely unrelated 
tests
   2. having too many tests share the same test dag
      * makes tests unnecessarily complicated and noisy
      * can reach a point where the structure required by one test is 
incompatible, or at least very inconvenient, when trying to meet the needs of 
another test
      * developer inconvenience -- need to understand, and alter, many 
different tests, entirely unrelated to your change
          - non-trivial risk that "fixing" a test actually breaks it, when did 
not really need to change it at all for your PR
   
   So, I do believe there are problems with (1) mixing test and example dags 
and (2) having many tests share the same test dag.  And my question is, what is 
the benefit?  If there is a return, then great.  But I don't see it.
   
   What do you think?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to