codeant-ai-for-open-source[bot] commented on code in PR #37920:
URL: https://github.com/apache/superset/pull/37920#discussion_r2796299418
##########
superset/db_engine_specs/base.py:
##########
@@ -1280,6 +1280,109 @@ def handle_comparison_filter(
else:
raise ValueError(f"Invalid comparison filter operator: {op}")
+ # DSPM: SO-82
+ @classmethod
+ def handle_array_filter_big_query(
+ cls, sqla_col: Any, op: utils.FilterOperator, value: list[Any]
+ ) -> BinaryExpression:
+ """
+ Handle array filter operations (CONTAINS, NOT_CONTAINS, EQUALS,
NOT_EQUALS)
+ specifically for BigQuery, with SQL injection protection.
+ """
+ # Only allow safe column names: letters, numbers, underscores
+ col_key = (
+ getattr(sqla_col, "key", None)
+ or getattr(sqla_col, "name", None)
+ or str(sqla_col)
+ )
+ if not re.match(r"^[A-Za-z_][A-Za-z0-9_]*$", col_key):
+ raise ValueError(f"Unsafe column name: {col_key}")
+ col_expr = f"`{col_key}`"
+
+ if op == utils.FilterOperator.CONTAINS:
+ clauses = []
+ params = {}
+ for i, v in enumerate(value):
+ clause = (
+ f"EXISTS (SELECT 1 FROM UNNEST({col_expr}) AS x WHERE x =
:val_{i})" # nosec B608 # noqa: S608
+ )
+ clauses.append(clause)
+ params[f"val_{i}"] = v
+ sql = " AND ".join(clauses)
+ return text(sql).bindparams(*[bindparam(k, v) for k, v in
params.items()])
+
+ elif op == utils.FilterOperator.NOT_CONTAINS:
+ clauses = []
+ params = {}
+ for i, v in enumerate(value):
+ clause = (
+ f"NOT EXISTS (SELECT 1 FROM UNNEST({col_expr}) AS x " #
nosec B608 # noqa: S608
+ f"WHERE x = :val_{i})"
+ )
+ clauses.append(clause)
+ params[f"val_{i}"] = v
+ sql = " AND ".join(clauses)
+ return text(sql).bindparams(*[bindparam(k, v) for k, v in
params.items()])
+
+ elif op == utils.FilterOperator.EQUALS:
+ arr = value
+ if not isinstance(arr, (list, tuple)):
+ arr = [arr]
+ length_clause = f"ARRAY_LENGTH({col_expr}) = {len(arr)}"
+ element_clauses = [
+ (f"{col_expr}[OFFSET({i})] = {repr(arr[i])}") for i in
range(len(arr))
+ ]
+ sql = f"({length_clause} AND " + " AND ".join(element_clauses) +
")"
+ return text(sql)
+ elif op == utils.FilterOperator.NOT_EQUALS:
+ arr = value
+ if not isinstance(arr, (list, tuple)):
+ arr = [arr]
+ length_clause = f"ARRAY_LENGTH({col_expr}) != {len(arr)}"
+ element_clauses = [
+ (f"{col_expr}[OFFSET({i})] != {repr(arr[i])}") for i in
range(len(arr))
+ ]
+ sql = f"({length_clause} OR " + " OR ".join(element_clauses) + ")"
+ return text(sql)
Review Comment:
**Suggestion:** In the BigQuery-specific array EQUALS/NOT_EQUALS handling,
values are injected into the SQL string via repr(), which allows crafted inputs
to break out of the literal context and leads to SQL injection; these branches
should use bound parameters just like the CONTAINS/NOT_CONTAINS cases.
[security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ BigQuery array EQUALS/NOT_EQUALS filters allow SQL injection.
- ❌ Malicious chart filters can run arbitrary SQL on BigQuery.
- ⚠️ BigQuery-backed dashboards vulnerable when using array equality.
```
</details>
```suggestion
element_clauses: list[str] = []
params: dict[str, Any] = {}
for i, v in enumerate(arr):
element_clauses.append(f"{col_expr}[OFFSET({i})] = :val_{i}")
params[f"val_{i}"] = v
sql = f"({length_clause} AND " + " AND ".join(element_clauses) +
")"
return text(sql).bindparams(*[bindparam(k, v) for k, v in
params.items()])
elif op == utils.FilterOperator.NOT_EQUALS:
arr = value
if not isinstance(arr, (list, tuple)):
arr = [arr]
length_clause = f"ARRAY_LENGTH({col_expr}) != {len(arr)}"
element_clauses: list[str] = []
params: dict[str, Any] = {}
for i, v in enumerate(arr):
element_clauses.append(f"{col_expr}[OFFSET({i})] !=
:val_{i}")
params[f"val_{i}"] = v
sql = f"({length_clause} OR " + " OR ".join(element_clauses) +
")"
return text(sql).bindparams(*[bindparam(k, v) for k, v in
params.items()])
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure or use a database whose engine spec has `engine_name == "Google
BigQuery"`,
so that `BaseEngineSpec.handle_array_filter` in
`superset/db_engine_specs/base.py:1351-1384` will dispatch to
`handle_array_filter_big_query` at
`superset/db_engine_specs/base.py:1283-1346`.
2. In Python (or via the chart data flow), call
`BaseEngineSpec.handle_array_filter(sqla_col, utils.FilterOperator.EQUALS,
value=arr)`
where `sqla_col.key == "tags"` and `arr` is a list like `["x'); DROP TABLE
sensitive;
--"]`. This results in `handle_array_filter_big_query` executing the
`EQUALS` branch at
lines `1327-1336`.
3. Inside `handle_array_filter_big_query`, `element_clauses` is built using
`repr(arr[i])`
(line `1333`), so the malicious string is interpolated directly into the SQL
text:
`tags[OFFSET(0)] = 'x'); DROP TABLE sensitive; --'`. No `bindparam` is used
for this
branch, unlike `CONTAINS`/`NOT_CONTAINS` at lines `1303-1325`.
4. The constructed `TextClause` from `text(sql)` (lines `1335-1336` and
`1345-1346`) is
later executed via `BaseEngineSpec.execute()` in
`superset/db_engine_specs/base.py:1150-1173` as part of normal query
execution, sending
attacker-controlled SQL to BigQuery without parameterization, which is a
classic SQL
injection pattern.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/base.py
**Line:** 1332:1346
**Comment:**
*Security: In the BigQuery-specific array EQUALS/NOT_EQUALS handling,
values are injected into the SQL string via repr(), which allows crafted inputs
to break out of the literal context and leads to SQL injection; these branches
should use bound parameters just like the CONTAINS/NOT_CONTAINS cases.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37920&comment_hash=70284babd04928eaaa8904de1e3bb425ad2694db79c3a7d077b84f24f5f38636&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37920&comment_hash=70284babd04928eaaa8904de1e3bb425ad2694db79c3a7d077b84f24f5f38636&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]