potiuk commented on issue #44353: URL: https://github.com/apache/airflow/issues/44353#issuecomment-2499517214
Comment and proposal: I think we won't avoid some "airflow.common" eventually for thos and #44352 reason. It falls into packaging decisions that we deferred, but I think the cleanest way is to have some `common` package. We have all the experiences from the provider's there and we should learn from it. This package could be **potentially** embedded into TaskSDK and Core (with changed package name for example), but then there is a whole set of problems what happens if a different version of same package is embedded in different packages installed at the same time (at least for Local Executor and standalone case we will have to have both Task SDK and "core" installed together). And at the same time we want to update Airlfow code independently from Workers core - which makes it tricky as we should be able to handle the case where workers "common" code is older than "core" common code. IMHO - cleanest and simplest solution for this package is to have a separate package that comes with Airlfow and it will have to have a very strict compatibility policy - something we already went through with `common.sql` for example: a) it should never have a breaking change b) it always be back-compatible (which is another way of saying a) c) we should have some tests that should make sure that breaking change did not happen accidentally. d) TaskSDK package should have `>= SOME_3_X_0` requirement for the common code - and should work with any "future" version of the common code The (somewhat successful but likely could be improved) attempt of doing it was in https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/common/sql/hooks/sql.pyi which is automatcally genereed by `update-common-sql-api-stubs` pre-commit. Which then delegated the check if interface changed to `mypy`. But IMHO it is far too little (and unnecessary if we attempt to actually test the common code for compatibility). What really works (and we have it in providers) is running current tests against older versions of packages we want to tests - - this has much more "compatibility" properties than checking if the Python API changed (because it also tests the behaviour). So we could potentially used similar techniques we use for providers now, where we determine the "lowest direct" set of dependencies and run "compatibility" tests of new providers with older airflow versions - only here it would have to be more of a "forward" testing - we should get tests for the *OLDER* released Task SDK tests and run them against NEWER common code. This is all doable I think - Task SDK and set of tests it runs should be relatvely small, and we should be able to add CI job to checkout 3.0, 3.1, 3.2, .... Task SDK tests and run them against latest "common" code to see if there are any compatibility issues. I think it would be good to agree on some rules here - how long we can expect the old "common" to work with new "core" - and allow to remove some old, deprecated code. Also that will allow us to improve and evolve tests code, because it's a bit trickly to run test code checkout from one branch/version and run it with code from another version (this is the set of test compatibities we already have in provider's compatibility tests - and we will only be able to maintain it for a long time because we have the rule of 6 months support for old Airflow versions, it would be next to impossible to take main `tests` and run them on 2.0 for example - we are going to have very similar situation with running "old task sdk tests" with "new common code" and we should limit the number of versions we should run it with. That would be my proposal how to solve it - and providers could be use as a showcase on kinds of issues we will need to handle and show that it can work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
