abnobdoss commented on code in PR #3423:
URL: https://github.com/apache/iceberg-python/pull/3423#discussion_r3321490725


##########
pyiceberg/catalog/rest/__init__.py:
##########


Review Comment:
   Since these 6 lines are repeated would it make sense to init auth_type to 
None before line 439 and then use auth_type being set as the gate to run these 
6 lines (the custom validation + the auth manager initialization)?



##########
pyiceberg/catalog/rest/__init__.py:
##########
@@ -435,7 +436,16 @@ def _create_session(self) -> Session:
                 elif ssl_client_cert := ssl_client.get(CERT):
                     session.cert = ssl_client_cert
 
-        if auth_config := self.properties.get(AUTH):
+        if raw_auth := self.properties.get(AUTH):
+            # When auth is configured via an environment variable (e.g. 
PYICEBERG_CATALOG__<NAME>__AUTH),
+            # the value arrives as a JSON string rather than a dict. Decode it 
before processing.
+            if isinstance(raw_auth, str):
+                try:
+                    auth_config: dict[str, Any] = json.loads(raw_auth)
+                except json.JSONDecodeError as e:
+                    raise ValueError(f"Failed to parse auth configuration as 
JSON: {raw_auth!r}") from e
+            else:
+                auth_config = raw_auth
             auth_type = auth_config.get("type")

Review Comment:
   It's possible that a valid JSON was passed but that it's not a dictionary. 
Should we add a guard here that validates a dictionary was passed?



##########
pyiceberg/catalog/rest/__init__.py:
##########
@@ -435,7 +436,16 @@ def _create_session(self) -> Session:
                 elif ssl_client_cert := ssl_client.get(CERT):
                     session.cert = ssl_client_cert
 
-        if auth_config := self.properties.get(AUTH):
+        if raw_auth := self.properties.get(AUTH):

Review Comment:
   if we identify which env var we're using to load the catalog (e.g. here it's 
the dictionary env var) we can make the custom auth type errors later on a bit 
more clear.
   ```suggestion
           if raw_auth := self.properties.get(AUTH):
               source_env_var = f"PYICEBERG_CATALOG__{self.name.upper()}__AUTH"
   ```



##########
tests/catalog/test_rest.py:
##########
@@ -3167,3 +3167,135 @@ def test_load_table_without_storage_credentials(
     )
     assert actual.metadata.model_dump() == expected.metadata.model_dump()
     assert actual == expected
+
+
+# Tests for issue #3422: REST catalog auth cannot be configured via environment
+# variables unless auth JSON strings are decoded.
+
+
+def test_rest_catalog_with_basic_auth_as_json_string(rest_mock: Mocker) -> 
None:

Review Comment:
   Would it make sense to combine these two tests via parameterization so we 
can extend it and do the remaining auth types and a few edge cases? In 
particular google and entra accepts scopes as list[str] and oauth2 accepts 
refresh_margin + expires_in as ints - do those get parsed downstream or do 
those face similar parsing issues with json string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to