potiuk commented on issue #27292: URL: https://github.com/apache/airflow/issues/27292#issuecomment-1454858434
> the issue with spliting the provider is mostly that no one from Google picked it. Once someone picks it and start working on it we will be able to overcome the tech difficulties. We don't know yet how the provider will be split but we do know it must be done. I am not so sure. Actually - using extras might be way simpler approach that is going to solve most of the pains with getting all the libraries in I **think**, without introducing the **huge** hassle of extracting common code and using it from multiple "google" providers. If we do split google provider, then the maintenance pain willl absolutely pale in comparision comparing to what we had in case of `common.sql` - and there were at least 4 or 5 traps of the common code extraction and maintenance which were really painful to protect against and fix. If we find a way to solve most of the user problems about dependencies with extras as suggested by @r-richmond (which I **think** is actually possible) then I see no reason to split the provider to be honest. Splitting the google provider will be massive undertaking and if we do that, it will take us more than a few iterations on multiple providers to solve most of the teething problems that we will not realise when splitting and those problems will keep on coming back for as long as the common part of the google provider will keep on evolving - we will keep on breaking things with older versions of "specific" providers when we will release the new common code. This is all but given that it will happen and we have almost no way to protect against it. Look how small common.sql "API" surface was and how many problems we had: * https://github.com/apache/airflow/pull/25430 * https://github.com/apache/airflow/pull/25822 * https://github.com/apache/airflow/pull/25855 * https://github.com/apache/airflow/pull/25939 * https://github.com/apache/airflow/pull/26758 * https://github.com/apache/airflow/pull/26761 * https://github.com/apache/airflow/pull/26944 * https://github.com/apache/airflow/pull/27599 * https://github.com/apache/airflow/pull/27843 * https://github.com/apache/airflow/pull/27854 * https://github.com/apache/airflow/pull/27912 * https://github.com/apache/airflow/pull/28744 Not all of those - but most were directly caused by decision to extract common code for a number of SQL operators. And the main problem why those errors affected users was because there is no way to test new release of "common" code with all possble releases of all possible providers that are using it. You can at most test semi-thoroughly the latest versions of the providers and common code together. This is what we do. Thats' why splitting google provider is SCARY. because you will have order of magnitude more of similar problems and we will have no way to avoid them. And even more. Google common code will keep evolving in much faster rate than common.sql code. Our problem wiht common.sql stopped at the moment it stopped changing. But Google common code will never stop changing. So decision about splitting gogole provider is not as "light" as you think. And that's why I am very, very sceptical about splitting it (otherwise I would have done that myself a long time ago). Of course using extras does not solve "all" problems - but I think most, It won't solve the case where you would like to use different provider version for one Google service and different for another. But - to be honest - if we get to the point where someone needs to do it, then we have bigger issue and this is one of those problems that leads to more issues than it solves. I would very strongly prefer the situation that user has to modify their dags for google - if they want to (for example) use new features from another service. Yes, it's a bit of pain for them - but far, far, less pain for everyone else (including them) in the future, where some incompatibilities in the common code will cause even more problems. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
