dstandish commented on a change in pull request #21816: URL: https://github.com/apache/airflow/pull/21816#discussion_r816504357
########## File path: UPDATING.md ########## @@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation --> +### Deprecation: `Connection.extra` must be JSON-encoded dict + +#### TLDR + +From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded python dict. Review comment: ```suggestion From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded Python dict. ``` ########## File path: UPDATING.md ########## @@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation --> +### Deprecation: `Connection.extra` must be JSON-encoded dict + +#### TLDR + +From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded python dict. + +#### What, why, and when? + +Airflow's Connection is used for storing credentials. For storage of information that does not +fit into user / password / host / schema / port, we have the `extra` string field. Its intention +was always to provide for storage of arbitrary key-value pairs, like `no_host_key_check` in the SSH +hook, or `keyfile_dict` in GCP. + +But since the field is string, it's technically been permissible to store any string value. For example +one could have stored the string value `'my-website.com'` and used this in the hook. But this is a very +bad practice. One reason is intelligibility: when you look at the value for `extra`, you don't have any idea +what its purpose is. Better would be to store `{"api_host": "my-website.com"}` which at least tells you +*something* about the value. Another reason is extensibility: if you store the API host as a simple string +value, what happens if you need to add more information, such as the API endpoint, or credentials? Then +you would need to convert the string to a dict, and this would be a breaking change. + +For these reason, starting in Airflow 3.0 we will require that the `Connection.extra` field store +a JSON-encoded python dict. Review comment: ```suggestion a JSON-encoded Python dict. ``` ########## File path: airflow/models/connection.py ########## @@ -134,10 +134,39 @@ def __init__( self.schema = schema self.port = port self.extra = extra + if self.extra: + self._validate_extra(self.extra, self.conn_id) if self.password: mask_secret(self.password) + @staticmethod + def _validate_extra(extra, conn_id) -> None: + """ + Here we verify that ``extra`` is a JSON-encoded python dict. From Airflow 3.0, we should no Review comment: ```suggestion Here we verify that ``extra`` is a JSON-encoded Python dict. From Airflow 3.0, we should no ``` ########## File path: UPDATING.md ########## @@ -81,6 +81,47 @@ https://developers.google.com/style/inclusive-documentation --> +### Deprecation: `Connection.extra` must be JSON-encoded dict + +#### TLDR + +From Airflow 3.0, the `extra` field in airflow connections must be a JSON-encoded python dict. + +#### What, why, and when? + +Airflow's Connection is used for storing credentials. For storage of information that does not +fit into user / password / host / schema / port, we have the `extra` string field. Its intention +was always to provide for storage of arbitrary key-value pairs, like `no_host_key_check` in the SSH +hook, or `keyfile_dict` in GCP. + +But since the field is string, it's technically been permissible to store any string value. For example +one could have stored the string value `'my-website.com'` and used this in the hook. But this is a very +bad practice. One reason is intelligibility: when you look at the value for `extra`, you don't have any idea +what its purpose is. Better would be to store `{"api_host": "my-website.com"}` which at least tells you +*something* about the value. Another reason is extensibility: if you store the API host as a simple string +value, what happens if you need to add more information, such as the API endpoint, or credentials? Then +you would need to convert the string to a dict, and this would be a breaking change. + +For these reason, starting in Airflow 3.0 we will require that the `Connection.extra` field store +a JSON-encoded python dict. + +#### How will I be affected? + +For users of providers that are included in the Airflow codebase, you should not have to make any changes +because in the Airflow codebase we should not allow hooks to misuse the `Connection.extra` field in this way. + +However, if you have any custom hooks that store something other than JSON dict, you will have to update it. +If you do, you should see a warning any time that this connection is retrieved or instantiated (e.g. it should show up in +task logs). + +To see if you have any connections that will need to be updated, you can run this command: + +```shell +airflow connections export - 2>&1 >/dev/null | grep 'non-JSON' +``` + +This will catch any warnings about connections that are storing something other than JSON-encoded python dict in the `extra` field. Review comment: ```suggestion This will catch any warnings about connections that are storing something other than JSON-encoded Python dict in the `extra` field. ``` ########## File path: airflow/models/connection.py ########## @@ -134,10 +134,39 @@ def __init__( self.schema = schema self.port = port self.extra = extra + if self.extra: + self._validate_extra(self.extra, self.conn_id) if self.password: mask_secret(self.password) + @staticmethod + def _validate_extra(extra, conn_id) -> None: + """ + Here we verify that ``extra`` is a JSON-encoded python dict. From Airflow 3.0, we should no + longer suppress these errors but raise instead. + """ + if extra is None: + return None + try: + extra_parsed = json.loads(extra) + if not isinstance(extra_parsed, dict): + warnings.warn( + "Encountered JSON value in `extra` which does not parse as a dictionary in " + f"connection {conn_id!r}. From Airflow 3.0, the `extra` field must contain a JSON " + "representation of a python dict.", Review comment: ```suggestion "representation of a Python dict.", ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
