codeant-ai-for-open-source[bot] commented on code in PR #40746:
URL: https://github.com/apache/superset/pull/40746#discussion_r3407770291


##########
superset/mcp_service/user/schemas.py:
##########
@@ -278,7 +306,11 @@ def serialize_user_object(
         user_roles = getattr(user, "roles", None)
         if user_roles is not None:
             try:
-                roles = [r.name for r in user_roles if hasattr(r, "name")]
+                roles = [
+                    escape_llm_context_delimiters(r.name)
+                    for r in user_roles
+                    if hasattr(r, "name") and isinstance(r.name, str)
+                ]

Review Comment:
   **Suggestion:** The role extraction in `serialize_user_object` is 
all-or-nothing: if any single role object raises `DetachedInstanceError` or 
`AttributeError`, the list comprehension aborts and `roles` is set to `None`, 
dropping all otherwise valid role names. Handle exceptions per role item (like 
the new `UserInfo` validator does) so one bad role does not erase the full role 
list. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ get_user_info may omit valid roles when one detaches.
   - ⚠️ list_users may redact all roles due to one bad role.
   - ⚠️ LLM clients get incomplete permission context for affected users.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `superset/mcp_service/user/schemas.py` and locate 
`serialize_user_object` at lines
   291–293, and the roles comprehension and exception handler at lines 45–56 
(diff lines
   304–316).
   
   2. In `tests/unit_tests/mcp_service/user/test_schemas.py`, note how 
`DetachedRole` is
   defined in `test_user_info_ignores_role_with_detached_instance` at lines 
62–67 to simulate
   an ORM role whose `.name` property raises `DetachedInstanceError`.
   
   3. Create a FAB-like user object in a test (or REPL) with `user.roles = 
[role_good,
   role_detached]`, where `role_good.name == "Admin"` and `role_detached` is the
   `DetachedRole` described above, then call `serialize_user_object(user,
   include_sensitive=True, include_roles=True)` from
   `superset/mcp_service/user/schemas.py:291–293`.
   
   4. When the comprehension at diff lines 309–313 executes, accessing 
`role_detached.name`
   raises `DetachedInstanceError`, causing the `except (AttributeError,
   DetachedInstanceError)` block at diff line 314–315 to run and set `roles = 
None`, so the
   returned `UserInfo.roles` (used by `get_user_info` at
   `superset/mcp_service/user/tool/get_user_info.py:16–18` and `list_users` at
   `superset/mcp_service/user/tool/list_users.py:12–19`) drops the valid 
`"Admin"` role
   instead of preserving it alongside skipping only the bad role.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=6e3d30f898954f619e2b9481be1e848d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=6e3d30f898954f619e2b9481be1e848d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/user/schemas.py
   **Line:** 309:313
   **Comment:**
        *Logic Error: The role extraction in `serialize_user_object` is 
all-or-nothing: if any single role object raises `DetachedInstanceError` or 
`AttributeError`, the list comprehension aborts and `roles` is set to `None`, 
dropping all otherwise valid role names. Handle exceptions per role item (like 
the new `UserInfo` validator does) so one bad role does not erase the full role 
list.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40746&comment_hash=bbfe0c6aca196f9522244acecb18730da4d64d42630335a9d3503d179f3a1415&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40746&comment_hash=bbfe0c6aca196f9522244acecb18730da4d64d42630335a9d3503d179f3a1415&reaction=dislike'>👎</a>



##########
tests/unit_tests/mcp_service/user/test_schemas.py:
##########
@@ -0,0 +1,179 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Unit tests for user-related MCP schemas."""
+
+from unittest.mock import MagicMock
+
+import pytest
+from pydantic import ValidationError
+from sqlalchemy.orm.exc import DetachedInstanceError
+
+from superset.mcp_service.user.schemas import (
+    sanitize_for_llm_context,
+    serialize_user_object,
+    UserInfo,
+)
+
+
+def test_user_info_rejects_bare_string_for_roles() -> None:
+    """A plain string must not be silently split into individual characters."""
+    with pytest.raises(ValidationError):
+        UserInfo(roles="Admin")
+
+
+def test_user_info_preserves_empty_roles_list() -> None:
+    """Empty roles should remain [] so callers can distinguish it from None."""
+    info = UserInfo(roles=[])
+    assert info.roles == []
+
+
+def test_user_info_coerces_role_objects_to_names() -> None:
+    """Role-like ORM objects must be converted to their .name strings."""
+    role_admin = MagicMock()
+    role_admin.name = "Admin"
+    role_alpha = MagicMock()
+    role_alpha.name = "Alpha"
+
+    info = UserInfo(roles=[role_admin, role_alpha])
+
+    assert info.roles == ["Admin", "Alpha"]
+
+
+def test_user_info_ignores_role_with_detached_instance() -> None:
+    """Detached ORM roles must not blow up serialization."""
+    role_good = MagicMock()
+    role_good.name = "Admin"
+
+    class DetachedRole:
+        @property
+        def name(self):
+            raise DetachedInstanceError()
+
+    role_detached = DetachedRole()
+
+    info = UserInfo(roles=[role_good, role_detached])
+
+    assert info.roles == ["Admin"]
+
+
+def test_serialize_user_object_round_trip_with_empty_roles() -> None:
+    """serialize_user_object must produce UserInfo.roles == [] for empty 
roles."""
+    user = MagicMock()
+    user.id = 1
+    user.username = "admin"
+    user.first_name = "Admin"
+    user.last_name = "User"
+    user.active = True
+    user.email = "[email protected]"
+    user.changed_on = None
+    user.roles = []
+
+    info = serialize_user_object(user, include_sensitive=True, 
include_roles=True)
+
+    assert info is not None
+    assert info.roles == []
+    assert info.username == "admin"
+    assert info.first_name == sanitize_for_llm_context(
+        "Admin", field_path=("first_name",)
+    )
+    assert info.last_name == sanitize_for_llm_context(
+        "User", field_path=("last_name",)
+    )
+    assert info.active is True
+    assert info.email == "[email protected]"
+
+
+def test_serialize_user_object_round_trip_with_role_objects() -> None:
+    """Full from_attributes path through serialize_user_object -> UserInfo."""
+    role_admin = MagicMock()
+    role_admin.name = "Admin"
+
+    user = MagicMock()
+    user.id = 1
+    user.username = "admin"
+    user.first_name = "Admin"
+    user.last_name = "User"
+    user.active = True
+    user.email = "[email protected]"
+    user.changed_on = None
+    user.roles = [role_admin]
+
+    info = serialize_user_object(user, include_sensitive=True, 
include_roles=True)
+
+    assert info is not None
+    assert info.roles == ["Admin"]
+    assert info.username == "admin"
+    assert info.first_name == sanitize_for_llm_context(
+        "Admin", field_path=("first_name",)
+    )
+    assert info.last_name == sanitize_for_llm_context(
+        "User", field_path=("last_name",)
+    )
+    assert info.active is True
+    assert info.email == "[email protected]"
+
+
+def test_serialize_user_object_skips_roles_when_include_roles_false() -> None:
+    """serialize_user_object must return roles=None when 
include_roles=False."""
+    role_admin = MagicMock()
+    role_admin.name = "Admin"
+
+    user = MagicMock()
+    user.id = 1
+    user.username = "admin"
+    user.first_name = "Admin"
+    user.last_name = "User"
+    user.active = True
+    user.email = "[email protected]"
+    user.changed_on = None
+    user.roles = [role_admin]
+
+    info = serialize_user_object(user, include_sensitive=True, 
include_roles=False)
+
+    assert info is not None
+    assert info.roles is None
+    assert info.email == "[email protected]"
+
+
+def test_serialize_user_object_skips_email_when_include_sensitive_false() -> 
None:
+    """serialize_user_object must return email=None when 
include_sensitive=False."""
+    role_admin = MagicMock()
+    role_admin.name = "Admin"
+
+    user = MagicMock()
+    user.id = 1
+    user.username = "admin"
+    user.first_name = "Admin"
+    user.last_name = "User"
+    user.active = True
+    user.email = "[email protected]"
+    user.changed_on = None
+    user.roles = [role_admin]
+
+    info = serialize_user_object(user, include_sensitive=False, 
include_roles=True)
+
+    assert info is not None
+    assert info.email is None
+    assert info.roles == ["Admin"]

Review Comment:
   **Suggestion:** This assertion contradicts the serializer contract and 
current implementation: when `include_sensitive=False`, both sensitive fields 
(`email` and `roles`) are redacted. Expecting `roles == ["Admin"]` will cause 
this test to fail and enforces incorrect behavior. Update the expectation to 
`roles is None`. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Unit test fails enforcing behavior opposite documented contract.
   - ⚠️ Confuses whether roles are treated as sensitive metadata.
   - ⚠️ Can block CI for MCP user schema changes.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `superset/mcp_service/user/schemas.py` and inspect 
`serialize_user_object` at
   lines 291–293 and 45–71 (diff lines 291–317): `roles` is only populated when
   `include_sensitive and include_roles` is true; otherwise `roles` remains 
`None` and is
   passed as such to `UserInfo`.
   
   2. Open `tests/unit_tests/mcp_service/user/tool/test_user_tools.py` and note
   `test_get_user_info_redacts_sensitive_when_denied` at lines 87–105, which 
asserts
   `data["email"] is None` and `data["roles"] is None` when
   `user_can_view_data_model_metadata` is patched to return `False`, confirming 
the contract
   that roles are sensitive and redacted with `include_sensitive=False`.
   
   3. Open `tests/unit_tests/mcp_service/user/test_schemas.py` and locate
   `test_serialize_user_object_skips_email_when_include_sensitive_false` around 
diff lines
   153–172, where a user with a single `"Admin"` role is passed to
   `serialize_user_object(user, include_sensitive=False, include_roles=True)`.
   
   4. Run `pytest
   
tests/unit_tests/mcp_service/user/test_schemas.py::test_serialize_user_object_skips_email_when_include_sensitive_false`;
   the function returns a `UserInfo` with `info.email is None` and `info.roles 
is None` per
   the implementation in `superset/mcp_service/user/schemas.py`, causing the 
assertion at
   diff line 172 (`assert info.roles == ["Admin"]`) to fail and incorrectly 
demand behavior
   that contradicts both the serializer docstring and the tool-level privacy 
tests.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7e2b3279a5ab41cbb0804f45719e9f0f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7e2b3279a5ab41cbb0804f45719e9f0f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/user/test_schemas.py
   **Line:** 172:172
   **Comment:**
        *Logic Error: This assertion contradicts the serializer contract and 
current implementation: when `include_sensitive=False`, both sensitive fields 
(`email` and `roles`) are redacted. Expecting `roles == ["Admin"]` will cause 
this test to fail and enforces incorrect behavior. Update the expectation to 
`roles is None`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40746&comment_hash=0e8d3b7156bf67945e92e012e5835561839ed42d118a5dde28f1413e0e6eeec3&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40746&comment_hash=0e8d3b7156bf67945e92e012e5835561839ed42d118a5dde28f1413e0e6eeec3&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to