[GitHub] [airflow] dstandish edited a comment on pull request #13421: BashOperator to raise AirflowSkipException on exit code 255

GitBox Sun, 03 Jan 2021 16:12:38 -0800


dstandish edited a comment on pull request #13421:
URL: https://github.com/apache/airflow/pull/13421#issuecomment-753694100



   hehe no worries :)
   
   i think the example_bash_operator does a good job of illustrating why 
example dags should not be used in tests.
   
   in general, tests should have only the elements that are absolutely 
necessary to test the behavior they are examining.
   
   let me give an example.
   
   consider this class:
   ```python
   class MyObj:
       def __init__(self, param1=None, param2=None):
           self.param1 = param1
           self.param2 = param2
           
       def get_param1(self):
           return self.param1
   ```
   
   consider these tests
   
   ```python
   def test_get_param1():
       m = MyObj('a')
       assert m.get_param1() == 'a'
   ```
   
   ```python
   def test_get_param1():
       m = MyObj('a', 'b')
       assert m.get_param1() == 'a'
   ```
   
   The first test is better because the value for `param2` is totally 
irrelevant to the test.  Including 'b' is just extra noise, and it is in fact a 
costly distraction because if there is something wrong with the test, it's 
another thing we have to chase down to see if it is the cause.  And if we need 
to make a change, then it's another thing we need to understand.
   
   Similarly, when we use example_bash_operator as a test dag for many 
different tests, inevitably we end up with a lot of extra noise in tests.  An 
example of this is in test `test_backfill_multi_dates` in 
`test_backfill_job.py`.  It uses example bash operator.
   
   There is a component in there that spells out expected execution order:
   ```python
           expected_execution_order = [
               ("runme_0", DEFAULT_DATE),
               ("runme_1", DEFAULT_DATE),
               ("runme_2", DEFAULT_DATE),
               ("runme_0", end_date),
               ("runme_1", end_date),
               ("runme_2", end_date),
               ("also_run_this", DEFAULT_DATE),
               ("also_run_this", end_date),
               ("run_after_loop", DEFAULT_DATE),
               ("run_after_loop", end_date),
               ("run_this_last", DEFAULT_DATE),
               ("run_this_last", end_date),
           ]
   ```
   
   I'd bet this test could be accomplished with fewer tasks.  But it uses all 
of them because that's what is in example_bash_operator.  And probably as 
people have added to that example, they add to this and other tests test.  But 
by doing so, we are spending effort, and not getting any return on our effort.  
Extraneous content in a test makes it tougher to understand exactly what it is 
testing.  So adding more tasks to this test makes this test _worse_, not 
better.  It increases technical debt.  And that is not good for the community.  
If I add another task to this test, merely because I want to demonstrate new 
feature in an example dag, then I make this test unnecessarily worse.
   
   Keeping test and example combined is not good for users either.   In 
handcuffs us with regard to writing example dags because we are constrained by 
the tests that reference it, tests which we don't want to modify without good 
reason.
   
   So in sum there are two main problems with this approach:
   
   1. test and example dags are the same
       * constrained from designing example dags to best exemplify features 
because they have mixed purposes
       * can't freely alter an example without dealing with entirely unrelated 
tests
   2. having too many tests share the same test dag
      * makes tests unnecessarily complicated and noisy
      * can reach a point where the structure required by one test is 
incompatible, or at least very inconvenient, when trying to meet the needs of 
another test
      * developer inconvenience -- need to understand, and alter, many 
different tests, entirely unrelated to your change
          - non-trivial risk that "fixing" a test actually breaks it, when did 
not really need to change it at all for your PR
   
   So, I do believe there are costs incurred in (1) mixing test and example 
dags and (2) having many tests share the same test dag.  And my question is, 
what is the benefit?  If there is a return, then great.  But I don't see it.
   
   What do you think?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] dstandish edited a comment on pull request #13421: BashOperator to raise AirflowSkipException on exit code 255

Reply via email to