[GitHub] [airflow] potiuk commented on a change in pull request #20190: Fix mypy providers

GitBox Sat, 11 Dec 2021 22:08:57 -0800


potiuk commented on a change in pull request #20190:
URL: https://github.com/apache/airflow/pull/20190#discussion_r767181577




##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -83,12 +83,13 @@ def __init__(
         yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+        options: Dict = {}
+        conn: Optional[Connection] = None
 
         try:
-            conn: "Optional[Connection]" = self.get_connection(conn_id)

Review comment:
       Hm @josh-fell (and @kaxil ) - Something struck, me I just looked closed 
at the https://github.com/apache/airflow/pull/18339 
   
   I thin while discussing it I made a silent assumption (which I see now was 
wrong ) that the connection was created as part of "operator's" initm but this 
is about creating it in the Hook's init, which IMHO is quite legitimate use 
case (as long as you do not instantiate the Hook in the operator's init(). And 
it is pretty common pattern in Airflow (and one we actually encourage). 
   
   I double-checked and I looked at the Databricks code and the only place I 
could see it being instantiated was _get_hook() and the only place where 
_get_hook() is called was inside "execute" and "on_kill" method of the 
Databricks operator - so that all sounds pretty legitimate. 
   
   Was not the whole issue caused by a misunderstanding of who's init it was? I 
think we have maaaaany cases where Hook is created "on-demand" in execute() 
method of the operator, also if you decide to create the Hook() inside the 
`@task`-decorated functions, it should work really well.  
   Are my  eyes cheating me ? O r maybe  I miss something?
   

##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -83,12 +83,13 @@ def __init__(
         yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+        options: Dict = {}
+        conn: Optional[Connection] = None
 
         try:
-            conn: "Optional[Connection]" = self.get_connection(conn_id)

Review comment:
       Hm @josh-fell (and @kaxil ) - Something struck, me I just looked closed 
at the https://github.com/apache/airflow/pull/18339 
   
   I think while discussing it I made a silent assumption (which I see now was 
wrong ) that the connection was created as part of "operator's" initm but this 
is about creating it in the Hook's init, which IMHO is quite legitimate use 
case (as long as you do not instantiate the Hook in the operator's init(). And 
it is pretty common pattern in Airflow (and one we actually encourage). 
   
   I double-checked and I looked at the Databricks code and the only place I 
could see it being instantiated was _get_hook() and the only place where 
_get_hook() is called was inside "execute" and "on_kill" method of the 
Databricks operator - so that all sounds pretty legitimate. 
   
   Was not the whole issue caused by a misunderstanding of who's init it was? I 
think we have maaaaany cases where Hook is created "on-demand" in execute() 
method of the operator, also if you decide to create the Hook() inside the 
`@task`-decorated functions, it should work really well.  
   Are my  eyes cheating me ? O r maybe  I miss something?
   

##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -83,12 +83,13 @@ def __init__(
         yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+        options: Dict = {}
+        conn: Optional[Connection] = None
 
         try:
-            conn: "Optional[Connection]" = self.get_connection(conn_id)

Review comment:
       Hm @josh-fell (and @kaxil ) - Something struck, me I just looked closed 
at the https://github.com/apache/airflow/pull/18339 
   
   I think while discussing it I made a silent assumption (which I see now was 
wrong ) that the connection was created as part of "operator's" init but this 
is about creating it in the Hook's init, which IMHO is quite legitimate use 
case (as long as you do not instantiate the Hook in the operator's init(). And 
it is pretty common pattern in Airflow (and one we actually encourage). 
   
   I double-checked and I looked at the Databricks code and the only place I 
could see it being instantiated was _get_hook() and the only place where 
_get_hook() is called was inside "execute" and "on_kill" method of the 
Databricks operator - so that all sounds pretty legitimate. 
   
   Was not the whole issue caused by a misunderstanding of who's init it was? I 
think we have maaaaany cases where Hook is created "on-demand" in execute() 
method of the operator, also if you decide to create the Hook() inside the 
`@task`-decorated functions, it should work really well.  
   Are my  eyes cheating me ? O r maybe  I miss something?
   

##########
File path: airflow/providers/apache/spark/hooks/spark_sql.py
##########
@@ -83,12 +83,13 @@ def __init__(
         yarn_queue: Optional[str] = None,
     ) -> None:
         super().__init__()
+        options: Dict = {}
+        conn: Optional[Connection] = None
 
         try:
-            conn: "Optional[Connection]" = self.get_connection(conn_id)

Review comment:
       Hm @josh-fell (and @kaxil ) - Something struck, me I just looked closer 
at the https://github.com/apache/airflow/pull/18339 
   
   I think while discussing it I made a silent assumption (which I see now was 
wrong ) that the connection was created as part of "operator's" init but this 
is about creating it in the Hook's init, which IMHO is quite legitimate use 
case (as long as you do not instantiate the Hook in the operator's init(). And 
it is pretty common pattern in Airflow (and one we actually encourage). 
   
   I double-checked and I looked at the Databricks code and the only place I 
could see it being instantiated was _get_hook() and the only place where 
_get_hook() is called was inside "execute" and "on_kill" method of the 
Databricks operator - so that all sounds pretty legitimate. 
   
   Was not the whole issue caused by a misunderstanding of who's init it was? I 
think we have maaaaany cases where Hook is created "on-demand" in execute() 
method of the operator, also if you decide to create the Hook() inside the 
`@task`-decorated functions, it should work really well.  
   Are my  eyes cheating me ? O r maybe  I miss something?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on a change in pull request #20190: Fix mypy providers

Reply via email to