potiuk commented on issue #44353:
URL: https://github.com/apache/airflow/issues/44353#issuecomment-2499517214

   Comment and proposal:
   
   I think we won't avoid some "airflow.common" eventually for thos and #44352 
reason. It falls into packaging decisions that we deferred, but I think the 
cleanest way is to have some `common` package.
   
   We have all the experiences from the provider's there and we should learn 
from it.
   
   This package could be **potentially** embedded into TaskSDK and Core (with 
changed package name for example), but then there is a whole set of problems 
what happens if a different version of same package is embedded in different 
packages installed at the same time (at least for Local Executor and standalone 
case we will have to have both Task SDK and "core" installed together). And at 
the same time we want to update Airlfow code independently from Workers core - 
which makes it tricky as we should be able to handle the case where workers 
"common" code is older than "core" common code.
   
   IMHO - cleanest and simplest solution for this package is to have a separate 
package that comes with Airlfow and it will have to have a very strict 
compatibility policy - something we already went through with `common.sql` for 
example:
   
   a) it should never have a breaking change
   b) it always be back-compatible (which is another way of saying a) 
   c) we should have some tests that should make sure that breaking change did 
not happen accidentally.
   d) TaskSDK package should have `>= SOME_3_X_0` requirement for the common 
code  - and should work with any "future" version of the common code
   
   The (somewhat successful but likely could be improved) attempt of doing it 
was in 
https://github.com/apache/airflow/blob/main/providers/src/airflow/providers/common/sql/hooks/sql.pyi
 which is automatcally genereed by `update-common-sql-api-stubs` pre-commit. 
Which then delegated the check if interface changed to `mypy`. But IMHO it is 
far too little (and unnecessary if we attempt to actually test the common code 
for compatibility).
   
   What really works (and we have it in providers) is running current tests 
against older versions of packages we want to tests - - this has much more 
"compatibility" properties than checking if the Python API changed  (because it 
also tests the behaviour). 
   
   So we could potentially used similar techniques we use for providers now, 
where we determine the "lowest direct" set of dependencies and run 
"compatibility" tests of new providers with older airflow versions  - only here 
it would have to be more of a "forward" testing - we should get tests for the 
*OLDER* released Task SDK tests and run them against NEWER common code. This is 
all doable I think - Task SDK and set of tests it runs should be relatvely 
small, and we should be able to add CI job to checkout 3.0, 3.1, 3.2, .... Task 
SDK tests and run them against latest "common" code to see if there are any 
compatibility issues.
   
   I think it would be good to agree on some rules here - how long we can 
expect the old "common" to work with new "core" - and allow to remove some old, 
deprecated code. Also that will allow us to improve and evolve tests code, 
because it's a bit trickly to run test code checkout from one branch/version 
and run it with code from another version (this is the set of test 
compatibities we already have in provider's compatibility tests - and we will 
only be able to maintain it for a long time because we have the rule of 6 
months support for old Airflow versions, it would be next to impossible to take 
main `tests` and run them on 2.0 for example - we are going to have very 
similar situation with running "old task sdk tests" with "new common code"  and 
we should limit the number of versions we should run it with.
   
   That would be my proposal how to solve it - and providers could be use as a 
showcase on kinds of issues we will need to handle and show that it can work.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to