codeant-ai-for-open-source[bot] commented on code in PR #37920:
URL: https://github.com/apache/superset/pull/37920#discussion_r2796299418


##########
superset/db_engine_specs/base.py:
##########
@@ -1280,6 +1280,109 @@ def handle_comparison_filter(
         else:
             raise ValueError(f"Invalid comparison filter operator: {op}")
 
+    # DSPM: SO-82
+    @classmethod
+    def handle_array_filter_big_query(
+        cls, sqla_col: Any, op: utils.FilterOperator, value: list[Any]
+    ) -> BinaryExpression:
+        """
+        Handle array filter operations (CONTAINS, NOT_CONTAINS, EQUALS, 
NOT_EQUALS)
+        specifically for BigQuery, with SQL injection protection.
+        """
+        # Only allow safe column names: letters, numbers, underscores
+        col_key = (
+            getattr(sqla_col, "key", None)
+            or getattr(sqla_col, "name", None)
+            or str(sqla_col)
+        )
+        if not re.match(r"^[A-Za-z_][A-Za-z0-9_]*$", col_key):
+            raise ValueError(f"Unsafe column name: {col_key}")
+        col_expr = f"`{col_key}`"
+
+        if op == utils.FilterOperator.CONTAINS:
+            clauses = []
+            params = {}
+            for i, v in enumerate(value):
+                clause = (
+                    f"EXISTS (SELECT 1 FROM UNNEST({col_expr}) AS x WHERE x = 
:val_{i})"  # nosec B608  # noqa: S608
+                )
+                clauses.append(clause)
+                params[f"val_{i}"] = v
+            sql = " AND ".join(clauses)
+            return text(sql).bindparams(*[bindparam(k, v) for k, v in 
params.items()])
+
+        elif op == utils.FilterOperator.NOT_CONTAINS:
+            clauses = []
+            params = {}
+            for i, v in enumerate(value):
+                clause = (
+                    f"NOT EXISTS (SELECT 1 FROM UNNEST({col_expr}) AS x "  # 
nosec B608  # noqa: S608
+                    f"WHERE x = :val_{i})"
+                )
+                clauses.append(clause)
+                params[f"val_{i}"] = v
+            sql = " AND ".join(clauses)
+            return text(sql).bindparams(*[bindparam(k, v) for k, v in 
params.items()])
+
+        elif op == utils.FilterOperator.EQUALS:
+            arr = value
+            if not isinstance(arr, (list, tuple)):
+                arr = [arr]
+            length_clause = f"ARRAY_LENGTH({col_expr}) = {len(arr)}"
+            element_clauses = [
+                (f"{col_expr}[OFFSET({i})] = {repr(arr[i])}") for i in 
range(len(arr))
+            ]
+            sql = f"({length_clause} AND " + " AND ".join(element_clauses) + 
")"
+            return text(sql)
+        elif op == utils.FilterOperator.NOT_EQUALS:
+            arr = value
+            if not isinstance(arr, (list, tuple)):
+                arr = [arr]
+            length_clause = f"ARRAY_LENGTH({col_expr}) != {len(arr)}"
+            element_clauses = [
+                (f"{col_expr}[OFFSET({i})] != {repr(arr[i])}") for i in 
range(len(arr))
+            ]
+            sql = f"({length_clause} OR " + " OR ".join(element_clauses) + ")"
+            return text(sql)

Review Comment:
   **Suggestion:** In the BigQuery-specific array EQUALS/NOT_EQUALS handling, 
values are injected into the SQL string via repr(), which allows crafted inputs 
to break out of the literal context and leads to SQL injection; these branches 
should use bound parameters just like the CONTAINS/NOT_CONTAINS cases. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ BigQuery array EQUALS/NOT_EQUALS filters allow SQL injection.
   - ❌ Malicious chart filters can run arbitrary SQL on BigQuery.
   - ⚠️ BigQuery-backed dashboards vulnerable when using array equality.
   ```
   </details>
   
   ```suggestion
               element_clauses: list[str] = []
               params: dict[str, Any] = {}
               for i, v in enumerate(arr):
                   element_clauses.append(f"{col_expr}[OFFSET({i})] = :val_{i}")
                   params[f"val_{i}"] = v
               sql = f"({length_clause} AND " + " AND ".join(element_clauses) + 
")"
               return text(sql).bindparams(*[bindparam(k, v) for k, v in 
params.items()])
           elif op == utils.FilterOperator.NOT_EQUALS:
               arr = value
               if not isinstance(arr, (list, tuple)):
                   arr = [arr]
               length_clause = f"ARRAY_LENGTH({col_expr}) != {len(arr)}"
               element_clauses: list[str] = []
               params: dict[str, Any] = {}
               for i, v in enumerate(arr):
                   element_clauses.append(f"{col_expr}[OFFSET({i})] != 
:val_{i}")
                   params[f"val_{i}"] = v
               sql = f"({length_clause} OR " + " OR ".join(element_clauses) + 
")"
               return text(sql).bindparams(*[bindparam(k, v) for k, v in 
params.items()])
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure or use a database whose engine spec has `engine_name == "Google 
BigQuery"`,
   so that `BaseEngineSpec.handle_array_filter` in
   `superset/db_engine_specs/base.py:1351-1384` will dispatch to
   `handle_array_filter_big_query` at 
`superset/db_engine_specs/base.py:1283-1346`.
   
   2. In Python (or via the chart data flow), call
   `BaseEngineSpec.handle_array_filter(sqla_col, utils.FilterOperator.EQUALS, 
value=arr)`
   where `sqla_col.key == "tags"` and `arr` is a list like `["x'); DROP TABLE 
sensitive;
   --"]`. This results in `handle_array_filter_big_query` executing the 
`EQUALS` branch at
   lines `1327-1336`.
   
   3. Inside `handle_array_filter_big_query`, `element_clauses` is built using 
`repr(arr[i])`
   (line `1333`), so the malicious string is interpolated directly into the SQL 
text:
   `tags[OFFSET(0)] = 'x'); DROP TABLE sensitive; --'`. No `bindparam` is used 
for this
   branch, unlike `CONTAINS`/`NOT_CONTAINS` at lines `1303-1325`.
   
   4. The constructed `TextClause` from `text(sql)` (lines `1335-1336` and 
`1345-1346`) is
   later executed via `BaseEngineSpec.execute()` in
   `superset/db_engine_specs/base.py:1150-1173` as part of normal query 
execution, sending
   attacker-controlled SQL to BigQuery without parameterization, which is a 
classic SQL
   injection pattern.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/base.py
   **Line:** 1332:1346
   **Comment:**
        *Security: In the BigQuery-specific array EQUALS/NOT_EQUALS handling, 
values are injected into the SQL string via repr(), which allows crafted inputs 
to break out of the literal context and leads to SQL injection; these branches 
should use bound parameters just like the CONTAINS/NOT_CONTAINS cases.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37920&comment_hash=70284babd04928eaaa8904de1e3bb425ad2694db79c3a7d077b84f24f5f38636&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F37920&comment_hash=70284babd04928eaaa8904de1e3bb425ad2694db79c3a7d077b84f24f5f38636&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to