potiuk commented on a change in pull request #17423:
URL: https://github.com/apache/airflow/pull/17423#discussion_r682966113



##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will 
encourage people to use it in their other operators deriving from DBApi. It 
will be rather difficult to make sure that everyone is using it in "hasattr" 
way. Even if we two remember it now, we will forget about it and other 
committers who are not involved will not even know about this. We could 
potentially automate a pre-commit check if the "DBApiHook" is accessed with 
hasattr, but I think we will never be able to do it with 100% accuracy and I 
think it's simply not worth it for that single field, that can be simply 
replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any 
change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres 
operator (and any other operator that uses it), but again I think sacrificing 
backwards compatibility in this case is simply not worth it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will 
encourage people to use it in their other Hooks deriving from DBApiHook. It 
will be rather difficult to make sure that everyone is using it in "hasattr" 
way. Even if we two remember it now, we will forget about it and other 
committers who are not involved will not even know about this. We could 
potentially automate a pre-commit check if the "DBApiHook" is accessed with 
hasattr, but I think we will never be able to do it with 100% accuracy and I 
think it's simply not worth it for that single field, that can be simply 
replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any 
change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres 
operator (and any other operator that uses it), but again I think sacrificing 
backwards compatibility in this case is simply not worth it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       The problem is that having `schema` as attribute in DBApiHook will 
encourage people to use it in their other Hooks deriving from DBApiHook. It 
will be rather difficult to make sure that everyone is using it in "hasattr" 
way. Even if we two remember it now, we will forget about it and other 
committers who are not involved will not even know about this. We could 
potentially automate a pre-commit check if the "DBApiHook's schema" is accessed 
with hasattr, but I think we will never be able to do it with 100% accuracy and 
I think it's simply not worth it for that single field, that can be simply 
replaced with single line for each operator that wants to use it (in __init__):
   
   ```
           self.schema: Optional[str] = kwargs.pop("schema", None)
   ```
   
   I think we should think about DBApiHook as "Public" API of Airflow and any 
change to it should be very, very carefully considered.
   
   Another option would be to add `>=Airflow 2.2` limitation to Postgres 
operator (and any other operator that uses it), but again I think sacrificing 
backwards compatibility in this case is simply not worth it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Ah actually I see an error .in my solution.. I should not do "pop" in 
the args :)

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Removed "pop"  from DBApi to leave it for other hooks. 
   
   Actually I just realized the way it was implemented before - with pop() had 
undesired side effect for Hooks that already used  kwargs.pop() .The side 
effect was that kwargs.pop() in DBApiHook would remove the schema from kwargs 
in derived classes and their `self.schema = kwargs.pop("schema", None)` would 
override the schema to None (!)
   
   Example: mysql:
    
   
https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
   
   So in fact, the original change (not yet released luckily) in DBApiHook was 
even more disruptive than the failed `PostgresHook`. I am actually glad it was 
uncovered now, as it would be far more disruptive it was released in Airflow.
   
   That's why we should be **extremely** careful with changing DBApiHook (and 
BaseHook and similar).
   

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Removed "pop"  from DBApi to leave it for other hooks. 
   
   Actually I just realized the way it was implemented before - with pop() had 
undesired side effect for Hooks that already used  kwargs.pop() .The side 
effect was that kwargs.pop() in DBApiHook would remove the `schema` from kwargs 
in derived classes and their `self.schema = kwargs.pop("schema", None)` would 
override the schema to None (!)
   
   Example: mysql:
    
   
https://github.com/apache/airflow/blob/main/airflow/providers/mysql/hooks/mysql.py#L61
   
   So in fact, the original change (not yet released luckily) in DBApiHook was 
even more disruptive than the failed `PostgresHook`. I am actually glad it was 
uncovered now, as it would be far more disruptive it was released in Airflow.
   
   That's why we should be **extremely** careful with changing DBApiHook (and 
BaseHook and similar).
   

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       😱 🙀 
   

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       I think this is unlikely edge case (much less likely than use of schema 
when it will be defined in DBApiHook). I hardly see the use for that. Why 
somone would like to define a DBApi -derived hook and pass a 'schema' parameter 
to it while also changing it in it's own `__init__`? Seems extremely unlikely, 
also It feels natural that in such case the change should be done before 
`super.__init__()` rather than after.   
   
   We can assume that "schema" is our convention for kwarg para that we should 
use for all DB Hooks.  We can standardise it (it's already commonly used)  and 
release in Airflow 3 DBApiHook I think. 
   
   For now we might want to add some more docs/comments explaining it and some 
Airflow 3 notice about it. WDYT? 

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       I think this is unlikely edge case (much less likely than use of schema 
when it will be defined in DBApiHook as public field). I hardly see the use for 
that. Why somone would like to define a DBApi -derived hook and pass a 'schema' 
parameter to it while also changing it in it's own `__init__`? Seems extremely 
unlikely, also It feels natural that in such case the change should be done 
before `super.__init__()` rather than after.   
   
   We can assume that "schema" is our convention for kwarg para that we should 
use for all DB Hooks.  We can standardise it (it's already commonly used)  and 
release in Airflow 3 DBApiHook I think. 
   
   For now we might want to add some more docs/comments explaining it and some 
Airflow 3 notice about it. WDYT? 

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       Right - I think i fiigured out better way of doing it. I've added the 
'schema' argument explicitly as optional kwarg to DBApiHook and made a comment 
about the "change schema value" in the description of the parameter.
   
   I think it's much better - it makes schema an explicit part of the DBHookAPI 
hook's API, it is fully backwards compatible and it makes it very easy for 
Airflow 3 change - we will simply make the self.schema public and convert all 
the operators that use it in the "old" way.  See the latest fixup.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       > That's why I'm thinking letting both DbApiHook and any derived hooks 
both set schema might be the best of both worlds here? Then the derived hooks 
could stop setting schema in Airflow 3 (or the next time they get a min core 
version bump really).
   
   The problem with this is - that it has already happened that we overlooked 
that the `DBApiHook.schema` object has been accessed directly by a provider, 
which made it backwards incompatible with Airflow 2.1. We do not have any 
mechanism to prevents this is in the future again, if we have it as a public 
field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" 
part and by making a public field in this class, we change the API.
   
   The way I proposed - in the last fixup, `schema` becomes part of the DBApi 
'initialization' API. If any other hook stores the schema field as self.schema 
- so be it, it is "its own responsibiliity" - we make it clear now in the 
constructor of the DBApiHook that passing "schema" as kwargs is THE way how to 
override the schema, and by not having a public field, we clearly say that the 
deriving hook cannot expect that it can change it and expect "DBApiHook" 
getUri() method will use it.

##########
File path: airflow/providers/postgres/hooks/postgres.py
##########
@@ -70,6 +70,7 @@ def __init__(self, *args, **kwargs) -> None:
         super().__init__(*args, **kwargs)
         self.connection: Optional[Connection] = kwargs.pop("connection", None)
         self.conn: connection = None
+        self.schema: Optional[str] = kwargs.pop("schema", None)

Review comment:
       > That's why I'm thinking letting both DbApiHook and any derived hooks 
both set schema might be the best of both worlds here? Then the derived hooks 
could stop setting schema in Airflow 3 (or the next time they get a min core 
version bump really).
   
   The problem with this is - that it has already happened that we overlooked 
that the `DBApiHook.schema` object has been accessed directly by a provider, 
which made it backwards incompatible with Airflow 2.1. We do not have any 
mechanism to prevents this is in the future again, if we have it as a public 
field in DBApiHook. This is a bit of a problem that DBAPiHook is "public API" 
part and by making a public field in this class, we change the API.
   
   The way I proposed - in the last fixup, `schema` becomes part of the 
DBApiHook 'initialization' API. If any other hook stores the schema field as 
self.schema - so be it, it is "its own responsibiliity" - we make it clear now 
in the constructor of the DBApiHook that passing "schema" as kwargs is THE way 
how to override the schema, and by not having a public field, we clearly say 
that the deriving hook cannot expect that it can change it and expect 
"DBApiHook" getUri() method will use it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to