[GitHub] [airflow] potiuk commented on pull request #24058: Convert Athena Sample DAG to System Test

GitBox Thu, 02 Jun 2022 04:59:35 -0700


potiuk commented on PR #24058:
URL: https://github.com/apache/airflow/pull/24058#issuecomment-1144780452

I love the way it is implemented @ferruzzi :). I think **maybe** over time
we will generalize some of that to other providers, but I think @bhirsz
explained very well about some of the problems we had when we tried to
generalize too early.

When we initially implemented system tests (more than 3 years ago) we went
the route of "implementing common, generalizable code" and implementing system
tests usign it for google -> and then it turned out that keeping the
generalization working for mutliple diferent tests is kinda difficult and
indeed pollutes the solution a lot.

Since then I got much bigger fan of "in order to make something reusable, it
should be made usable first". So I'd say the right approach here is to "bubble
up" the common utils - first let's see how reusable this approach can be for
other AWS "properties" - not only Athena (we will likely learn that some of
that is not as reusable as we thought) and then let's try to use similar
approach in other providers and see a) whether it is needed there, b) whether
the generalisation brings more problems than it solves.

I think there is no need to get an AIP or anything like that for those -
working code and PRs gradually making stuff re-usable is the best approach.

There is one caveat though. I think (and there are lots of articles and
blogs about it) there is a big difference between the "real application" code
and test code. See for example here:
https://codeshelter.wordpress.com/2011/04/07/dry-and-damp-principles-when-developing-and-unit-testing/

1) For real application code "DRY" is more important than "DAMP"
2) For test code "DAMP" is more important than "DRY"

It's never 0/1 of course - some small level of duplication (less DRY) is ok
even for application code and some small level of "abstraction and
reusabilitty" (less DAMP) is acceptable (pytest fixtures for example are good
example of less DAMP more DRY approach). But generally speaking when I look at
the test code, DAMP is far more important criteria than DRY. I think there is
a very thin lline between " the test can be read and understood on its own with
the common code elsewhere" vs. "I have no idea what the test is doing because
far too many things happen outside of it". This is the main reason for AIP-47
at all - the problem with the generalized approach we had was that it was
rather difficult to understand what the tests were doing because different
parts of the tests were in different places.

There is this really nice property that test is standalone and does not
require pytest to run - just setting one variable. We should keep it.
Generation of the variable if not set is even cooler and if we can make it
generic enough to be reused elsewhere, that would be fantastic.

However we have to remeber to not loose those properties of simplicity and
readabilityh also for some simpler providers - we should see how the code in
AWS can be generalized for those - and if we see a way and it does not make
those tests unnecessary complex and less readable - by all means, let's do it
(but not too early).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on pull request #24058: Convert Athena Sample DAG to System Test

Reply via email to