potiuk commented on issue #15933:
URL: https://github.com/apache/airflow/issues/15933#issuecomment-970181315


   I think *should* is the key problem here. The initial split is quite easy, 
but what would happen after few releases is scary like hell. "Keep backwards 
compatibiliity of commons" in this case is quite a bit of wishful thinking and 
tt's far easier said than done and it requires constant vigiliance and fixes of 
accidental incompatibilities that will eventually creeps-in leading to 
uncontrollable growth in complexity.
   
   The problem is when people start refactoring and add code that will break 
the compatibility accidentally or when you want to do a refactor that will 
improve the common code but introduces compatibilty isses.. 
   
   We already saw examples of that with db_api: see the comment here: 
https://github.com/apache/airflow/blob/main/airflow/hooks/dbapi.py#L45
   This is just one class with few methods and I personally recall at least 3 
cases where there were almost merged (or even merged) changes that would break 
the compatibility of existing, released providers (in all compatible versions) 
accidentally:
   Splitting Google provider is the same but order or magnitude worse 
potentially as there is much more common code than that one class.
   
   People making changes to such common code (and often even 
reviewwers/maintainers) might not realise the consequences of their changes on 
already released packages. Some changes will accidentally break compatibilities 
even if you are careful. It basically requires that all the relased 
google-providers are FUTURE compatible with alll the released version of the 
"common" package.
   
   Unless you have full test suite that can handle various cases, and make sure 
that the "common package" will work with ALL already released and compatible 
providers that people have, thiere is no way to "make sure" it's the case. And 
even if we can add such a test suite (which is possible to some extent just 
very costly on maintenance and running), this prevents you from doing more 
"bold" refactorings - which is generally very bad side-effect of such approach. 
I think ability to refector code is crucial to maintainability. 
   
   For example now, we have such a test suite for all providers - we make sure 
that they import without warnings on Airflow 2.1 in our CI. But this is not a 
full guarantee they will work with Airflow 2.1 - this test is just a "smoke" 
test - but it already caught at least 2 cases of seemingly "innocent" change 
that would make all such providers stop working on 2.1.
   
   Of course you can also introduce "breaking" changes in the common code (and 
release 2.0 package), but then this inevitably leads to one of two things: 
   
   1) you also release all dependent packages with "breaking" realease that is 
effectively equivalent to releasing a new "google provider" release today.
   
   2) you have to maintain compatibility in all the dependent packages (thy 
shoudl work with both 1.0 and 2.0 of the commons) which leads to messy code and 
will break eventually as you add more changes. It can only be maintained for 
short time and eventually it leads to 1) - you have to say at some point of 
time "provider google-x.N" only works with "commons-2.0" and above. Just 
maintaining those dependencies is a pain and you require dedicated people who 
would keep an eye on those dependencies.
   
   The question here is what is more costly (and for whom):
   
   a) complexity of maintenance of compatibilities between different versions 
of  common packages and released "google" packages, with potential ability for 
the user to upgrade only some parts 
   
   or 
   
   b) complexity of the users who have to adapt to potentially more frequently 
handling breaking changes with new "full gogle provider" release
   
   I tnink the a) one is something that will grow more and more complex for 
maintainers over time, where b) is kinda "stable" - it requires some regular 
effort from the users but in a long run it is esiear to handle by them. Also a) 
has one very uncomfortable,  for open-source project at least,  property. I 
immediately imagine many issues opened by the users "i want to install 
google-ads-6 and google-gcs-3 because this and that and they do not work 
together because they require different "commons". Just conversations about 
that and explaining what can and cannot be done will take a good chunk of time 
for maintainers who will know how it works. Explaining that in docs will be 
next to impossible I am afraid. Right now we avoid all those conversations by 
releasing a single google provider. The conversation is very simple: "if you 
want to use google ads which were added in provider 5.0 you need to also adapt 
all other google properties to the version that is there". End of story. By havi
 ng multiple versions, the amount of user stories here grows exponentially 
large.
   
   Also I think in a long term you will not avoid the "breaking all" releases 
anyway. Users will have to do it anyway - only it will cost them more because 
they will have to do it much less frequently (counter-intuitively). There is an 
old saying that if migration is painful - just do it more often. The "split 
providers" approach leads to potentially less frequent, but more painful 
upgrades to our users + adds effort on maintenance of it on committers. The 
"single provider" means that you potentially must do more frequent but less 
paindul upgrades  (say every month when you upgrade google provider).  But 
there is no solution with "no pain".
   
   I am not saying all this is impossible - technically splitting the google 
provider is possible. I just think that the person (or rather grouop) who 
commits doing it realises the consequences and takes the burden of organising 
it in the way that we do not land in "dependency" help. I personally currently 
have not enough courage to commit to it to be honest. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to