Re: [PR] fix(mcp): relax column name regex, improve generate_chart validation errors and examples [superset]

via GitHub Wed, 13 May 2026 09:55:10 -0700


codeant-ai-for-open-source[bot] commented on code in PR #39915:
URL: https://github.com/apache/superset/pull/39915#discussion_r3236044872



##########
tests/unit_tests/mcp_service/chart/test_chart_schemas.py:
##########
@@ -778,3 +778,137 @@ def 
test_client_warnings_discarded_even_when_server_also_warns(self) -> None:
         assert len(req.sanitization_warnings) == 1
         assert "chart_name" in req.sanitization_warnings[0]
         assert "injected" not in req.sanitization_warnings[0]
+
+
+class TestColumnRefNameRelaxedPattern:
+    """ColumnRef.name no longer enforces a strict regex pattern.
+
+    Many valid database column names were previously rejected:
+    - Names starting with a digit (e.g. "1Q_revenue")
+    - Names with locale-specific characters
+    The field_validator sanitize_name() still blocks XSS and SQL injection.
+    """
+
+    def test_digit_prefixed_name_accepted(self) -> None:
+        """Column names starting with a digit must now be accepted."""
+        col = ColumnRef(name="1Q_revenue")
+        assert col.name == "1Q_revenue"
+
+    def test_name_with_hyphen_accepted(self) -> None:
+        col = ColumnRef(name="order-date")
+        assert col.name == "order-date"
+
+    def test_name_with_dot_accepted(self) -> None:
+        col = ColumnRef(name="schema.column")
+        assert col.name == "schema.column"
+
+    def test_name_with_spaces_accepted(self) -> None:
+        col = ColumnRef(name="Total Revenue")
+        assert col.name == "Total Revenue"
+
+    def test_script_tag_neutralized(self) -> None:
+        """sanitize_name() neutralizes script-tag XSS via nh3.
+        Depending on nh3 version, nh3 either strips the entire script element
+        including its content (leaving empty → ValidationError) or strips only
+        the tag delimiters (leaving 'alert(1)'). Either way, no raw <script>
+        tag is stored in the column name."""
+        try:
+            col = ColumnRef(name="<script>alert(1)</script>")
+            assert "<script>" not in col.name
+        except ValidationError as exc:
+            # nh3 stripped entire element; empty-value guard raises with this 
message
+            assert "cannot be empty" in str(exc)  # noqa: PT017

Review Comment:
   **Suggestion:** This test does not enforce the intended guarantee that 
script-tag payloads are rejected or converted to a valid non-empty identifier: 
it passes even when the sanitized name becomes an empty string. Tighten the 
assertion so the success path also validates that the resulting value is 
non-empty, otherwise this test can mask a real validation bug. [incomplete 
implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Test may pass with empty sanitized column name.
   - ⚠️ Potential masking of chart column-name validation regressions.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In `superset/mcp_service/chart/schemas.py:26-38`, `ColumnRef.name` is 
declared as a
   `str` with `min_length=1`, and its field validator `sanitize_name` at
   `superset/mcp_service/chart/schemas.py:75-83` calls `sanitize_user_input(v, 
"Column name",
   max_length=255, check_sql_keywords=True)` and returns the result without any 
explicit
   post-sanitization emptiness check.
   
   2. `sanitize_user_input` in 
`superset/mcp_service/utils/sanitization.py:50-114` trims and
   rejects empty input before sanitization (lines 82-93), then strips HTML tags 
with
   `_strip_html_tags` (lines 101-103) and returns the sanitized value without 
re-checking
   emptiness, so script-only HTML such as `<script></script>` or a version of 
nh3 that drops
   script text could produce an empty string that still passes this helper.
   
   3. The test `test_script_tag_neutralized` in
   `tests/unit_tests/mcp_service/chart/test_chart_schemas.py:70-81` wraps
   `ColumnRef(name="<script>alert(1)</script>")` in a try/except; in the 
success path it only
   asserts `"<script>" not in col.name`, and in the except path it asserts 
`"cannot be
   empty"` appears in the ValidationError message, but it never asserts that 
`col.name` is
   non-empty in the success case, so a sanitized empty string (e.g., from
   `<script></script>`) would satisfy the test.
   
   4. Because `ChartConfig` models (e.g., XY/Table configs used in 
`GenerateChartRequest` at
   `superset/mcp_service/chart/schemas.py:1472-1477` and their tests at
   `tests/unit_tests/mcp_service/chart/test_chart_schemas.py:676-695`) rely on
   `ColumnRef.name` for query construction, an undetected regression where 
sanitization
   returns `""` would allow a ColumnRef with an empty name into those 
configurations; the
   current `test_script_tag_neutralized` would still pass and thus can mask 
this class of
   validation bug, whereas tightening the assertion to require a non-empty 
sanitized name in
   the success path would surface such regressions.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20tests%2Funit_tests%2Fmcp_service%2Fchart%2Ftest_chart_schemas.py%0A%2A%2ALine%3A%2A%2A%20815%3A820%0A%2A%2AComment%3A%2A%2A%0A%09%2AIncomplete%20Implementation%3A%20This%20test%20does%20not%20enforce%20the%20intended%20guarantee%20that%20script-tag%20payloads%20are%20rejected%20or%20converted%20to%20a%20valid%20non-empty%20identifier%3A%20it%20passes%20even%20when%20the%20sanitized%20name%20becomes%20an%20empty%20string.%20Tighten%20the%20assertion%20so%20the%20success%20path%20also%20validates%20that%20the%20resulting%20value%20is%20non-empty%2C%20otherwise%20this%20test%20can%20mask%20a%20real%20validation%20bug.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%2
 
0fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20tests%2Funit_tests%2Fmcp_service%2Fchart%2Ftest_chart_schemas.py%0A%2A%2ALine%3A%2A%2A%20815%3A820%0A%2A%2AComment%3A%2A%2A%0A%09%2AIncomplete%20Implementation%3A%20This%20test%20does%20not%20enforce%20the%20intended%20guarantee%20that%20script-tag%20payloads%20are%20rejected%20or%20converted%20to%20a%20valid%20non-empty%20identifier%3A%20it%20passes%20even%20when%20the%20sanitized%20name%20becomes%20an%20empty%20string.%20Tighten%20the%20assertion%20so%20the%20success%20path%20also%20validates%2
 
0that%20the%20resulting%20value%20is%20non-empty%2C%20otherwise%20this%20test%20can%20mask%20a%20real%20validation%20bug.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/chart/test_chart_schemas.py
   **Line:** 815:820
   **Comment:**
        *Incomplete Implementation: This test does not enforce the intended 
guarantee that script-tag payloads are rejected or converted to a valid 
non-empty identifier: it passes even when the sanitized name becomes an empty 
string. Tighten the assertion so the success path also validates that the 
resulting value is non-empty, otherwise this test can mask a real validation 
bug.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39915&comment_hash=b8b82c0b540ce0c3548d41c3de9f343402611a020db4fbd396108fca13319a90&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39915&comment_hash=b8b82c0b540ce0c3548d41c3de9f343402611a020db4fbd396108fca13319a90&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(mcp): relax column name regex, improve generate_chart validation errors and examples [superset]

Reply via email to