Re: [PR] fix(mcp): relax column name regex, improve generate_chart validation errors and examples [superset]

via GitHub Tue, 12 May 2026 10:10:46 -0700


fitzee commented on PR #39915:
URL: https://github.com/apache/superset/pull/39915#issuecomment-4432947986


   ## Griffen + Codex Synthesis — LGTM with nits
   
   ---
   
   ## Dual Review Synthesis
   
   **Griffen verdict:** LGTM with nits
   **Codex verdict:** LGTM with nits (no explicit blockers; one MEDIUM 
hypothesis, two LOWs)
   **Agreement level:** Partial — both agree no blockers; Codex caught two LOWs 
Griffen missed
   
   ### Where they agreed
   - Two-way door. Rollback: simple revert. Blast radius: MCP `generate_chart` 
only.
   - Security model is correct: sanitization replaces regex, is equivalent or 
stronger for all three fields.
   - `_format_single_error` extraction is sound; docstring examples are 
load-bearing and correct.
   - No CRITICAL or HIGH findings.
   
   ### Where they diverged
   
   **Codex MEDIUM: relaxed punctuation acceptance (Hypothesis)**
   Codex flags that removing the regex now allows characters like single 
quotes, commas, parentheses, and brackets through schema validation, since 
`check_sql_keywords=True` doesn't block SQL-significant punctuation — only 
keywords and shell metacharacters. If any downstream SQL construction uses 
these column names without proper quoting, that's an injection vector.
   
   Griffen's read: this is a pre-existing architectural property, not a 
regression introduced here. The correct invariant is quoting at SQL generation, 
not format restriction at schema validation. A regex that blocks `'` while 
allowing `SELECT` is not a meaningful injection guard. This doesn't block 
merge, but Codex's suggestion to document the invariant explicitly ("column 
fields must resolve against dataset metadata before SQL generation") is worth 
doing.
   
   ### Standout findings (caught by one, missed by the other)
   
   **LOW — Test suite doesn't cover the actual regression** *(Codex caught, 
Griffen missed)*
   The old regex `^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$` already accepted 
`1Q_revenue` (digit in `[a-zA-Z0-9_]`), `order-date` (hyphen in `\-`), 
`schema.column` (dot in `\.`), and `Total Revenue` (space in `\s`). The new 
tests pass under the old regex too — they don't demonstrate fixing the stated 
problem. The column names that actually fail the old regex would be non-ASCII 
(e.g., `café`, `客户名称`), names with parentheses, slashes, single quotes, etc.
   *Mitigation:* Add at least one test with a column name the old regex would 
have rejected — ideally the actual production case that motivated this PR.
   
   **LOW — `check_sql_keywords=True` is an undocumented validation tightening 
for filter/temporal columns** *(Codex caught, Griffen missed)*
   `FilterConfig.column` previously had no SQL keyword check (only the regex). 
Now it does. A column literally named `update`, `delete`, `create`, or `alter` 
will be rejected where it was previously accepted. Word-boundary matching 
(`\bDELETE\b`) means `deleted_at` or `created_date` are fine — the risk is 
exact-keyword column names, which is unusual but not impossible in some 
database schemas.
   *Mitigation:* Add tests documenting this behavior explicitly, and mention it 
in the PR description as a known tightening.
   
   ---
   
   ### Griffen's final call
   
   **LGTM with nits.** The security model is sound and net-positive. The two 
LOWs above are real but not blockers — they're documentation and test coverage 
gaps, not correctness failures. The MEDIUM hypothesis about downstream quoting 
is worth tracking architecturally but doesn't change anything in this PR.
   
   Nits to address before or after merge (none are blockers):
   1. Add a test with a column name the old regex actually rejected
   2. Document that `check_sql_keywords=True` is a tightening for 
filter/temporal columns (add to PR description or inline comment)
   3. Tighten `test_script_tag_neutralized` to assert the specific failure 
reason in the except branch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(mcp): relax column name regex, improve generate_chart validation errors and examples [superset]

Reply via email to