codeant-ai-for-open-source[bot] commented on code in PR #39915:
URL: https://github.com/apache/superset/pull/39915#discussion_r3236044872
##########
tests/unit_tests/mcp_service/chart/test_chart_schemas.py:
##########
@@ -778,3 +778,137 @@ def
test_client_warnings_discarded_even_when_server_also_warns(self) -> None:
assert len(req.sanitization_warnings) == 1
assert "chart_name" in req.sanitization_warnings[0]
assert "injected" not in req.sanitization_warnings[0]
+
+
+class TestColumnRefNameRelaxedPattern:
+ """ColumnRef.name no longer enforces a strict regex pattern.
+
+ Many valid database column names were previously rejected:
+ - Names starting with a digit (e.g. "1Q_revenue")
+ - Names with locale-specific characters
+ The field_validator sanitize_name() still blocks XSS and SQL injection.
+ """
+
+ def test_digit_prefixed_name_accepted(self) -> None:
+ """Column names starting with a digit must now be accepted."""
+ col = ColumnRef(name="1Q_revenue")
+ assert col.name == "1Q_revenue"
+
+ def test_name_with_hyphen_accepted(self) -> None:
+ col = ColumnRef(name="order-date")
+ assert col.name == "order-date"
+
+ def test_name_with_dot_accepted(self) -> None:
+ col = ColumnRef(name="schema.column")
+ assert col.name == "schema.column"
+
+ def test_name_with_spaces_accepted(self) -> None:
+ col = ColumnRef(name="Total Revenue")
+ assert col.name == "Total Revenue"
+
+ def test_script_tag_neutralized(self) -> None:
+ """sanitize_name() neutralizes script-tag XSS via nh3.
+ Depending on nh3 version, nh3 either strips the entire script element
+ including its content (leaving empty → ValidationError) or strips only
+ the tag delimiters (leaving 'alert(1)'). Either way, no raw <script>
+ tag is stored in the column name."""
+ try:
+ col = ColumnRef(name="<script>alert(1)</script>")
+ assert "<script>" not in col.name
+ except ValidationError as exc:
+ # nh3 stripped entire element; empty-value guard raises with this
message
+ assert "cannot be empty" in str(exc) # noqa: PT017
Review Comment:
**Suggestion:** This test does not enforce the intended guarantee that
script-tag payloads are rejected or converted to a valid non-empty identifier:
it passes even when the sanitized name becomes an empty string. Tighten the
assertion so the success path also validates that the resulting value is
non-empty, otherwise this test can mask a real validation bug. [incomplete
implementation]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Test may pass with empty sanitized column name.
- ⚠️ Potential masking of chart column-name validation regressions.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/mcp_service/chart/schemas.py:26-38`, `ColumnRef.name` is
declared as a
`str` with `min_length=1`, and its field validator `sanitize_name` at
`superset/mcp_service/chart/schemas.py:75-83` calls `sanitize_user_input(v,
"Column name",
max_length=255, check_sql_keywords=True)` and returns the result without any
explicit
post-sanitization emptiness check.
2. `sanitize_user_input` in
`superset/mcp_service/utils/sanitization.py:50-114` trims and
rejects empty input before sanitization (lines 82-93), then strips HTML tags
with
`_strip_html_tags` (lines 101-103) and returns the sanitized value without
re-checking
emptiness, so script-only HTML such as `<script></script>` or a version of
nh3 that drops
script text could produce an empty string that still passes this helper.
3. The test `test_script_tag_neutralized` in
`tests/unit_tests/mcp_service/chart/test_chart_schemas.py:70-81` wraps
`ColumnRef(name="<script>alert(1)</script>")` in a try/except; in the
success path it only
asserts `"<script>" not in col.name`, and in the except path it asserts
`"cannot be
empty"` appears in the ValidationError message, but it never asserts that
`col.name` is
non-empty in the success case, so a sanitized empty string (e.g., from
`<script></script>`) would satisfy the test.
4. Because `ChartConfig` models (e.g., XY/Table configs used in
`GenerateChartRequest` at
`superset/mcp_service/chart/schemas.py:1472-1477` and their tests at
`tests/unit_tests/mcp_service/chart/test_chart_schemas.py:676-695`) rely on
`ColumnRef.name` for query construction, an undetected regression where
sanitization
returns `""` would allow a ColumnRef with an empty name into those
configurations; the
current `test_script_tag_neutralized` would still pass and thus can mask
this class of
validation bug, whereas tightening the assertion to require a non-empty
sanitized name in
the success path would surface such regressions.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20tests%2Funit_tests%2Fmcp_service%2Fchart%2Ftest_chart_schemas.py%0A%2A%2ALine%3A%2A%2A%20815%3A820%0A%2A%2AComment%3A%2A%2A%0A%09%2AIncomplete%20Implementation%3A%20This%20test%20does%20not%20enforce%20the%20intended%20guarantee%20that%20script-tag%20payloads%20are%20rejected%20or%20converted%20to%20a%20valid%20non-empty%20identifier%3A%20it%20passes%20even%20when%20the%20sanitized%20name%20becomes%20an%20empty%20string.%20Tighten%20the%20assertion%20so%20the%20success%20path%20also%20validates%20that%20the%20resulting%20value%20is%20non-empty%2C%20otherwise%20this%20test%20can%20mask%20a%20real%20validation%20bug.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%2
0fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20tests%2Funit_tests%2Fmcp_service%2Fchart%2Ftest_chart_schemas.py%0A%2A%2ALine%3A%2A%2A%20815%3A820%0A%2A%2AComment%3A%2A%2A%0A%09%2AIncomplete%20Implementation%3A%20This%20test%20does%20not%20enforce%20the%20intended%20guarantee%20that%20script-tag%20payloads%20are%20rejected%20or%20converted%20to%20a%20valid%20non-empty%20identifier%3A%20it%20passes%20even%20when%20the%20sanitized%20name%20becomes%20an%20empty%20string.%20Tighten%20the%20assertion%20so%20the%20success%20path%20also%20validates%2
0that%20the%20resulting%20value%20is%20non-empty%2C%20otherwise%20this%20test%20can%20mask%20a%20real%20validation%20bug.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/chart/test_chart_schemas.py
**Line:** 815:820
**Comment:**
*Incomplete Implementation: This test does not enforce the intended
guarantee that script-tag payloads are rejected or converted to a valid
non-empty identifier: it passes even when the sanitized name becomes an empty
string. Tighten the assertion so the success path also validates that the
resulting value is non-empty, otherwise this test can mask a real validation
bug.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39915&comment_hash=b8b82c0b540ce0c3548d41c3de9f343402611a020db4fbd396108fca13319a90&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39915&comment_hash=b8b82c0b540ce0c3548d41c3de9f343402611a020db4fbd396108fca13319a90&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]