vincbeck commented on code in PR #25432:
URL: https://github.com/apache/airflow/pull/25432#discussion_r939093095


##########
airflow/providers/amazon/aws/secrets/secrets_manager.py:
##########
@@ -94,6 +95,9 @@ class SecretsManagerBackend(BaseSecretsBackend, LoggingMixin):
     :param full_url_mode: if True, the secrets must be stored as one conn URI 
in just one field per secret.
         If False (set it as false in backend_kwargs), you can store the secret 
using different
         fields (password, user...).
+    :param secret_values_are_urlencoded: If True, and full_url_mode is False, 
then the values are assumed to

Review Comment:
   For such boolean value, the verb is/are is always at the beginning
   
   ```suggestion
       :param are_secret_values_urlencoded: If True, and full_url_mode is 
False, then the values are assumed to
   ```



##########
airflow/providers/amazon/aws/secrets/secrets_manager.py:
##########
@@ -150,31 +175,154 @@ def _format_uri_with_extra(secret, conn_string):
 
         return conn_string
 
-    def get_uri_from_secret(self, secret):
+    def get_connection(self, conn_id: str) -> Optional[Connection]:
+        if not self.full_url_mode:
+            secret_string = self._get_secret(self.connections_prefix, conn_id)
+            secret_dict = self._deserialize_json_string(secret_string)
+
+            if not secret_dict:
+                return None
+
+            if 'extra' in secret_dict and isinstance(secret_dict['extra'], 
str):
+                secret_dict['extra'] = 
self._deserialize_json_string(secret_dict['extra'])
+
+            data = self._standardize_secret_keys(secret_dict)
+
+            if self.secret_values_are_urlencoded:
+                data = self._remove_escaping_in_secret_dict(secret=data, 
conn_id=conn_id)
+
+            port: Optional[int] = None
+
+            if data['port'] is not None:
+                port = int(data['port'])
+
+            return Connection(
+                login=data['user'],
+                password=data['password'],
+                host=data['host'],
+                port=port,
+                schema=data['schema'],
+                conn_type=data['conn_type'],
+                extra=data['extra'],
+            )
+
+        return super().get_connection(conn_id=conn_id)
+
+    def _standardize_secret_keys(self, secret: Dict[str, Any]) -> Dict[str, 
Any]:
+        """Standardize the names of the keys in the dict. These keys align 
with"""
         possible_words_for_conn_fields = {
             'user': ['user', 'username', 'login', 'user_name'],
             'password': ['password', 'pass', 'key'],
             'host': ['host', 'remote_host', 'server'],
             'port': ['port'],
             'schema': ['database', 'schema'],
             'conn_type': ['conn_type', 'conn_id', 'connection_type', 'engine'],
+            'extra': ['extra'],
         }
 
         for conn_field, extra_words in self.extra_conn_words.items():
             possible_words_for_conn_fields[conn_field].extend(extra_words)
 
-        conn_d = {}
+        conn_d: Dict[str, Any] = {}
         for conn_field, possible_words in 
possible_words_for_conn_fields.items():
             try:
                 conn_d[conn_field] = [v for k, v in secret.items() if k in 
possible_words][0]
             except IndexError:
-                conn_d[conn_field] = ''
+                conn_d[conn_field] = None
 
-        conn_string = 
"{conn_type}://{user}:{password}@{host}:{port}/{schema}".format(**conn_d)
+        return conn_d
 
+    def get_uri_from_secret(self, secret: Dict[str, str]) -> str:
+        conn_d: Dict[str, str] = {k: v if v else '' for k, v in 
self._standardize_secret_keys(secret).items()}
+        conn_string = 
"{conn_type}://{user}:{password}@{host}:{port}/{schema}".format(**conn_d)
         return self._format_uri_with_extra(secret, conn_string)
 
-    def get_conn_value(self, conn_id: str):
+    def _deserialize_json_string(self, value: Optional[str]) -> 
Optional[Dict[Any, Any]]:
+        if not value:
+            return None
+        try:
+            # Use ast.literal_eval for backwards compatibility.
+            # Previous version of this code had a comment saying that using 
json.loads caused errors.
+            # This likely means people were using dict reprs instead of valid 
JSONs.
+            res: Dict[str, Any] = json.loads(value)
+        except json.JSONDecodeError:
+            try:
+                res = ast.literal_eval(value) if value else None
+                warnings.warn(
+                    f'In future versions, `{type(self).__name__}` will only 
support valid JSONs, not dict'
+                    ' reprs. Please make sure your secret is a valid JSON.'
+                )
+            except ValueError:  # 'malformed node or string: ' error, for 
empty conns
+                return None
+
+        return res
+
+    def _remove_escaping_in_secret_dict(self, secret: Dict[str, Any], conn_id: 
str) -> Dict[str, Any]:
+        # When ``unquote(v) == v``, then removing unquote won't affect the 
user, regardless of
+        # whether or not ``v`` is URL-encoded. For example, "foo bar" is not 
URL-encoded. But
+        # because decoding it doesn't affect the value, then it will migrate 
safely when
+        # ``unquote`` gets removed.
+        #
+        # When parameters are URL-encoded, but decoding is idempotent, we need 
to warn the user
+        # to un-escape their secrets. For example, if "foo%20bar" is a 
URL-encoded string, then
+        # decoding is idempotent because ``unquote(unquote("foo%20bar")) == 
unquote("foo%20bar")``.
+        #
+        # In the rare situation that value is URL-encoded but the decoding is 
_not_ idempotent,
+        # this causes a major issue. For example, if "foo%2520bar" is 
URL-encoded, then decoding is
+        # _not_ idempotent because ``unquote(unquote("foo%2520bar")) != 
unquote("foo%2520bar")``
+        #
+        # This causes a problem for migration because if the user decodes 
their value, we cannot
+        # infer that is the case by looking at the decoded value (from our 
vantage point, it will
+        # look to be URL-encoded.)
+        #
+        # So when this uncommon situation occurs, the user _must_ adjust the 
configuration and set
+        # ``parameters_are_urlencoded`` to False to migrate safely. In all 
other cases, we do not
+        # need the user to adjust this object to migrate; they can transition 
their secrets with
+        # the default configuration.
+
+        warn_user = False
+        idempotent = True
+
+        for k, v in secret.copy().items():
+
+            if k == "extra" and isinstance(v, dict):
+                # The old behavior was that extras were _not_ urlencoded 
inside the secret.
+                # If they were urlencoded (e.g. "foo%20bar"), then they would 
be re-urlencoded
+                # (e.g. "foo%20bar" becomes "foo%2520bar") and then unquoted 
once when parsed.
+                # So we should just allow the extra dict to remain as-is.
+                continue
+
+            elif v is not None:
+                v_unquoted = unquote(v)
+                if v != v_unquoted:
+                    secret[k] = unquote(v)
+                    warn_user = True
+
+                    # Check to see if decoding is idempotent.
+                    if v_unquoted == unquote(v_unquoted):
+                        idempotent = False
+
+        if warn_user:
+            msg = (
+                "When ``full_url_mode=True``, URL-encoding secret values is 
deprecated. In future versions, "
+                f" this value will not be un-escaped. For the conn_id 
{conn_id!r}, please remove the"
+                " URL-encoding."
+                "\n\nThis warning was raised because the SecretsManagerBackend 
detected that this connection"
+                " was URL-encoded."
+            )
+            if idempotent:
+                msg = f" Once the values for conn_id {conn_id!r} are decoded, 
this warning will go away."
+            if not idempotent:
+                msg += (
+                    " In addition to decoding the values for your connection, 
you must also set"
+                    " ``secret_values_are_urlencoded=False`` for your config 
variable"
+                    " ``secrets.backend_kwargs`` because this connection's URL 
encoding is not idempotent."

Review Comment:
   I am not sure `because this connection's URL encoding is not idempotent` 
gives enough details to the user in order to understand the issue. I would add 
a link to the documentation you are updating in this PR



##########
docs/apache-airflow-providers-amazon/secrets-backends/aws-secrets-manager.rst:
##########
@@ -83,6 +83,27 @@ Verify that you can get the secret:
 
 If you don't want to use any ``connections_prefix`` for retrieving 
connections, set it as an empty string ``""`` in the configuration.
 
+URL-Encoding of Secrets When Full URL Mode is False
+"""""""""""""""""""""""""""""""""""""""""""""""""""
+
+Previous versions of the Amazon provider package required values in the AWS 
secret to be URL-encoded when the setting ``full_url_mode`` is ``false``.
+This behavior is now deprecated, and will be removed at a future date.
+
+In most cases, you should not have any issues migrating your secrets to not 
being URL-encoded in advance of the deprecation.
+Simply decoding your secret values will work, and no further changes are 
required.
+
+In rare circumstances, when URL-encoding is not idempotent, the 
``DeprecationWarning`` will tell you to add a new parameter to your 
``backend_kwargs``.

Review Comment:
   I would add an example to explain what idempotent means in this situation. I 
would actually reuse the example you gave in comments
   ```
   For example, if "foo%2520bar" is URL-encoded, then decoding is _not_ 
idempotent because ``unquote(unquote("foo%2520bar")) != unquote("foo%2520bar")``
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to