fitzee commented on PR #39915:
URL: https://github.com/apache/superset/pull/39915#issuecomment-4432947986
## Griffen + Codex Synthesis — LGTM with nits
---
## Dual Review Synthesis
**Griffen verdict:** LGTM with nits
**Codex verdict:** LGTM with nits (no explicit blockers; one MEDIUM
hypothesis, two LOWs)
**Agreement level:** Partial — both agree no blockers; Codex caught two LOWs
Griffen missed
### Where they agreed
- Two-way door. Rollback: simple revert. Blast radius: MCP `generate_chart`
only.
- Security model is correct: sanitization replaces regex, is equivalent or
stronger for all three fields.
- `_format_single_error` extraction is sound; docstring examples are
load-bearing and correct.
- No CRITICAL or HIGH findings.
### Where they diverged
**Codex MEDIUM: relaxed punctuation acceptance (Hypothesis)**
Codex flags that removing the regex now allows characters like single
quotes, commas, parentheses, and brackets through schema validation, since
`check_sql_keywords=True` doesn't block SQL-significant punctuation — only
keywords and shell metacharacters. If any downstream SQL construction uses
these column names without proper quoting, that's an injection vector.
Griffen's read: this is a pre-existing architectural property, not a
regression introduced here. The correct invariant is quoting at SQL generation,
not format restriction at schema validation. A regex that blocks `'` while
allowing `SELECT` is not a meaningful injection guard. This doesn't block
merge, but Codex's suggestion to document the invariant explicitly ("column
fields must resolve against dataset metadata before SQL generation") is worth
doing.
### Standout findings (caught by one, missed by the other)
**LOW — Test suite doesn't cover the actual regression** *(Codex caught,
Griffen missed)*
The old regex `^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$` already accepted
`1Q_revenue` (digit in `[a-zA-Z0-9_]`), `order-date` (hyphen in `\-`),
`schema.column` (dot in `\.`), and `Total Revenue` (space in `\s`). The new
tests pass under the old regex too — they don't demonstrate fixing the stated
problem. The column names that actually fail the old regex would be non-ASCII
(e.g., `café`, `客户名称`), names with parentheses, slashes, single quotes, etc.
*Mitigation:* Add at least one test with a column name the old regex would
have rejected — ideally the actual production case that motivated this PR.
**LOW — `check_sql_keywords=True` is an undocumented validation tightening
for filter/temporal columns** *(Codex caught, Griffen missed)*
`FilterConfig.column` previously had no SQL keyword check (only the regex).
Now it does. A column literally named `update`, `delete`, `create`, or `alter`
will be rejected where it was previously accepted. Word-boundary matching
(`\bDELETE\b`) means `deleted_at` or `created_date` are fine — the risk is
exact-keyword column names, which is unusual but not impossible in some
database schemas.
*Mitigation:* Add tests documenting this behavior explicitly, and mention it
in the PR description as a known tightening.
---
### Griffen's final call
**LGTM with nits.** The security model is sound and net-positive. The two
LOWs above are real but not blockers — they're documentation and test coverage
gaps, not correctness failures. The MEDIUM hypothesis about downstream quoting
is worth tracking architecturally but doesn't change anything in this PR.
Nits to address before or after merge (none are blockers):
1. Add a test with a column name the old regex actually rejected
2. Document that `check_sql_keywords=True` is a tightening for
filter/temporal columns (add to PR description or inline comment)
3. Tighten `test_script_tag_neutralized` to assert the specific failure
reason in the except branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]