potiuk commented on PR #27843:
URL: https://github.com/apache/airflow/pull/27843#issuecomment-1324696134

   > @potiuk I'm not entirely following why this broke (would like to know so I 
don't do it again in the future!); when I removed the method, I added the code 
back in to the BigQuery version of the operators 
[here](https://github.com/apache/airflow/pull/26761/files#diff-529929b4ca60ce73b8da0f45d8a5c43c2d4e391b913fe78b39892899f812951eL617-R621).
 But that doesn't seem to be enough? Do certain versions just not have this 
change available? And what checks would ensure that versions across providers 
will match when updates like this happen? (i.e. should I have made the BigQuery 
changes in a Google providers specific PR?)
   
   The problem had several steps and I tink this is a good learning for us for 
such common code for things that we should:
   
   a) look after during the review
   b) add automated protection for in common providers
   
   Again - this is something that we (or at least I) kind of expected when we 
started the common.sql journey. We picked common.sql as relatively small and 
one that we will be able to fix rather quickly if problems happen, but it had 
already shown us a number of scenarios that neither authors of changes, nor 
reviewers were aware of. "Common code" in separate providers is not an easy 
one, has multiple traps and unobvious problems - treating "separate comon code" 
lightly without realising those laads to issues like that - mainly because the 
code is both separate and coupled. The problems starts where coupling is 
unintended and not obvious (this was this case).
   
   I treat that as a very good learning so let me share the sequence of events 
(again @denimalpaca - do not treat it as somethign that you are "guilty" of - 
it happened that you were bold enough to make the changes, but you are as 
guilty as we as reviewers who failed to spot the problem.
   
   Sequence of events:
   
   1) Change adding columns in Google Provider #26368: 
   
   The added code that caused the problem:
   
![image](https://user-images.githubusercontent.com/595491/203497793-bc4de67c-6530-43f0-9b57-a298f419af80.png)
   
   This change adds coupling between BigQuery part of google provider and 
common.sql's in an unintended way. Using both `_get_failed_checks,` and 
`parse_boolean` which were not really intended to be part of the API.
   
   2) Google provider 8.4.0 has been releaased with those imports. That made it 
"official". At the moment google provider 8.4.0 has been released, those two 
methods became "API" - because they had a user (google provider 8.4.0) - and 
that user expects the methods to be there in the future version of 1.* version 
of common.sql (backwards compatibility)
   
   3) The https://github.com/apache/airflow/pull/26761 removed both methods 
from common.sql
   
   4) We released common.sql 1.3
   
   Now everyone who has 8.4.0 of Google Provider (many users) and upgrades 
common.sql to 1.3 (for example because they upgrade any other provider that has 
dependency on 1.3 new features) will break google provider.. Thy can fix it by 
upgrading google provider to 8.5.0, yes, but still the effect will be that by 
upgrading (for example) Presto, thy will break google BigQuery. We want to 
avoid that situation to escalate.
   
   
   The solution for the future:
   
   1) pay more attention during development and review and make sure we only 
use stuff that is intended to be backwards compatible (part of the API)
   
   2) pay more attention when removing code to see if there are any users of 
even the things that we **think** are not part of the API
   
   3) automate check for API `contracts` - i.e. implement automated checks that 
will catch and detect mis-use of non-public "API" of comon code (we already 
have some of that for a number of "core" features actually - one of those 
checks is 
https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_check_provider_airflow_compatibility.py
  - we need to have similar for "core" part ) 
   
   This is actually [Hyrum's law](https://www.hyrumslaw.com/) in full swing 
(only we did it ourselves with our own internal user - google provider):
   
   > With a sufficient number of users of an API, it does not matter what you 
promise in the contract: all observable behaviors of your system will be 
depended on by somebody.
   
   In this case both  `_get_failed_checks,` and `parse_boolean` were not 
**supposed** to be depended on, but unfortunately, they were.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to