dstandish commented on a change in pull request #21816:
URL: https://github.com/apache/airflow/pull/21816#discussion_r816504357



##########
File path: UPDATING.md
##########
@@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### Deprecation: `Connection.extra` must be JSON-encoded dict
+
+#### TLDR
+
+From Airflow 3.0, the `extra` field in airflow connections must be a 
JSON-encoded python dict.

Review comment:
       ```suggestion
   From Airflow 3.0, the `extra` field in airflow connections must be a 
JSON-encoded Python dict.
   ```

##########
File path: UPDATING.md
##########
@@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### Deprecation: `Connection.extra` must be JSON-encoded dict
+
+#### TLDR
+
+From Airflow 3.0, the `extra` field in airflow connections must be a 
JSON-encoded python dict.
+
+#### What, why, and when?
+
+Airflow's Connection is used for storing credentials.  For storage of 
information that does not
+fit into user / password / host / schema / port, we have the `extra` string 
field.  Its intention
+was always to provide for storage of arbitrary key-value pairs, like 
`no_host_key_check` in the SSH
+hook, or `keyfile_dict` in GCP.
+
+But since the field is string, it's technically been permissible to store any 
string value.  For example
+one could have stored the string value `'my-website.com'` and used this in the 
hook.  But this is a very
+bad practice. One reason is intelligibility: when you look at the value for 
`extra`, you don't have any idea
+what its purpose is.  Better would be to store `{"api_host": 
"my-website.com"}` which at least tells you
+*something* about the value.  Another reason is extensibility: if you store 
the API host as a simple string
+value, what happens if you need to add more information, such as the API 
endpoint, or credentials?  Then
+you would need to convert the string to a dict, and this would be a breaking 
change.
+
+For these reason, starting in Airflow 3.0 we will require that the 
`Connection.extra` field store
+a JSON-encoded python dict.

Review comment:
       ```suggestion
   a JSON-encoded Python dict.
   ```

##########
File path: airflow/models/connection.py
##########
@@ -134,10 +134,39 @@ def __init__(
             self.schema = schema
             self.port = port
             self.extra = extra
+        if self.extra:
+            self._validate_extra(self.extra, self.conn_id)
 
         if self.password:
             mask_secret(self.password)
 
+    @staticmethod
+    def _validate_extra(extra, conn_id) -> None:
+        """
+        Here we verify that ``extra`` is a JSON-encoded python dict.  From 
Airflow 3.0, we should no

Review comment:
       ```suggestion
           Here we verify that ``extra`` is a JSON-encoded Python dict.  From 
Airflow 3.0, we should no
   ```

##########
File path: UPDATING.md
##########
@@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation
 
 -->
 
+### Deprecation: `Connection.extra` must be JSON-encoded dict
+
+#### TLDR
+
+From Airflow 3.0, the `extra` field in airflow connections must be a 
JSON-encoded python dict.
+
+#### What, why, and when?
+
+Airflow's Connection is used for storing credentials.  For storage of 
information that does not
+fit into user / password / host / schema / port, we have the `extra` string 
field.  Its intention
+was always to provide for storage of arbitrary key-value pairs, like 
`no_host_key_check` in the SSH
+hook, or `keyfile_dict` in GCP.
+
+But since the field is string, it's technically been permissible to store any 
string value.  For example
+one could have stored the string value `'my-website.com'` and used this in the 
hook.  But this is a very
+bad practice. One reason is intelligibility: when you look at the value for 
`extra`, you don't have any idea
+what its purpose is.  Better would be to store `{"api_host": 
"my-website.com"}` which at least tells you
+*something* about the value.  Another reason is extensibility: if you store 
the API host as a simple string
+value, what happens if you need to add more information, such as the API 
endpoint, or credentials?  Then
+you would need to convert the string to a dict, and this would be a breaking 
change.
+
+For these reason, starting in Airflow 3.0 we will require that the 
`Connection.extra` field store
+a JSON-encoded python dict.
+
+#### How will I be affected?
+
+For users of providers that are included in the Airflow codebase, you should 
not have to make any changes
+because in the Airflow codebase we should not allow hooks to misuse the 
`Connection.extra` field in this way.
+
+However, if you have any custom hooks that store something other than JSON 
dict, you will have to update it.
+If you do, you should see a warning any time that this connection is retrieved 
or instantiated (e.g. it should show up in
+task logs).
+
+To see if you have any connections that will need to be updated, you can run 
this command:
+
+```shell
+airflow connections export - 2>&1 >/dev/null | grep 'non-JSON'
+```
+
+This will catch any warnings about connections that are storing something 
other than JSON-encoded python dict in the `extra` field.

Review comment:
       ```suggestion
   This will catch any warnings about connections that are storing something 
other than JSON-encoded Python dict in the `extra` field.
   ```

##########
File path: airflow/models/connection.py
##########
@@ -134,10 +134,39 @@ def __init__(
             self.schema = schema
             self.port = port
             self.extra = extra
+        if self.extra:
+            self._validate_extra(self.extra, self.conn_id)
 
         if self.password:
             mask_secret(self.password)
 
+    @staticmethod
+    def _validate_extra(extra, conn_id) -> None:
+        """
+        Here we verify that ``extra`` is a JSON-encoded python dict.  From 
Airflow 3.0, we should no
+        longer suppress these errors but raise instead.
+        """
+        if extra is None:
+            return None
+        try:
+            extra_parsed = json.loads(extra)
+            if not isinstance(extra_parsed, dict):
+                warnings.warn(
+                    "Encountered JSON value in `extra` which does not parse as 
a dictionary in "
+                    f"connection {conn_id!r}. From Airflow 3.0, the `extra` 
field must contain a JSON "
+                    "representation of a python dict.",

Review comment:
       ```suggestion
                       "representation of a Python dict.",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to