codeant-ai-for-open-source[bot] commented on code in PR #40567:
URL: https://github.com/apache/superset/pull/40567#discussion_r3348259583


##########
superset/models/helpers.py:
##########
@@ -1459,6 +1459,23 @@ def query(self, query_obj: QueryObjectDict) -> 
QueryResult:
         qry_start_dttm = datetime.now()
         query_str_ext = self.get_query_str_extended(query_obj)
         sql = query_str_ext.sql
+
+        # Mirror the DISALLOWED_SQL_* gate that sql_lab.execute_sql_statement
+        # enforces so both query surfaces honour the same denylist.
+        engine = self.db_engine_spec.engine
+        disallowed_functions = 
app.config["DISALLOWED_SQL_FUNCTIONS"].get(engine, set())
+        disallowed_tables = app.config["DISALLOWED_SQL_TABLES"].get(engine, 
set())
+        if disallowed_functions or disallowed_tables:
+            parsed_script = SQLScript(sql, engine=engine)
+            if disallowed_functions and parsed_script.check_functions_present(
+                disallowed_functions
+            ):
+                raise 
SupersetDisallowedSQLFunctionException(disallowed_functions)
+            if disallowed_tables and parsed_script.check_tables_present(
+                disallowed_tables
+            ):
+                raise SupersetDisallowedSQLTableException(disallowed_tables)

Review Comment:
   **Suggestion:** The table denylist branch raises 
`SupersetDisallowedSQLTableException` with the full configured denylist instead 
of only the tables actually present in the rendered SQL. This leaks internal 
security policy details to users through error messages and diverges from the 
SQL Lab behavior (which reports only matched tables). Compute the intersection 
of parsed table names and the denylist, and raise with that matched subset. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Chart data API leaks full denylisted table names.
   - ⚠️ Users learn operator's internal SQL table security policy.
   - ⚠️ Behavior inconsistent with SQL Lab disallowed-table error messaging.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure multiple disallowed tables for PostgreSQL in the Flask app 
config, e.g. set
   `current_app.config[\"DISALLOWED_SQL_TABLES\"] = {\"postgresql\": 
{\"pg_authid\",
   \"pg_shadow\", \"pg_stat_activity\"}}` as done via `_patch_disallowed` in
   `tests/unit_tests/models/helpers_test.py:2714-2721`.
   
   2. Build a SQLA dataset backed by a PostgreSQL database and engine 
`"postgresql"` using
   `SqlaTable` (see `_build_sqla_table_for_query` in
   `tests/unit_tests/connectors/sqla/models_test.py:93-116`, which sets
   `db_engine_spec.engine = "postgresql"` and patches `get_query_str_extended` 
to return a
   specific SQL string).
   
   3. Issue a chart data request that compiles to a query referencing only one 
of the
   disallowed tables, for example `"SELECT rolname FROM pg_authid"`, causing
   `SqlaTable.query()` (inherited from `ExploreMixin.query` at
   `superset/models/helpers.py:1452-1498`) to execute for that datasource via 
the
   `BaseViz.get_df` call at `superset/viz.py:268-47` and the chart-data endpoint
   `ChartDataRestApi.get_data` at `superset/charts/data/api.py:78-157`.
   
   4. In `ExploreMixin.query`, the denylist gate at 
`superset/models/helpers.py:86-98`
   computes `disallowed_tables = 
app.config["DISALLOWED_SQL_TABLES"].get(engine, set())` (all
   three tables), `parsed_script.check_tables_present(disallowed_tables)` 
returns `True`
   because `pg_authid` is present, and the code raises
   `SupersetDisallowedSQLTableException(disallowed_tables)` (line 1477), whose 
constructor in
   `superset/exceptions.py:4-16` formats the message `SQL statement references 
disallowed
   table(s): {tables}` with the entire configured set `{"pg_authid", 
"pg_shadow",
   "pg_stat_activity"}`, thereby exposing additional denylisted tables not 
actually
   referenced in the user's query, unlike the existing 
`_process_sql_expression` path tested
   in `tests/unit_tests/models/helpers_test.py:61-29` which asserts only the 
offending table
   name appears.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d3ff289292ae4343841ffadc5ec16d4e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d3ff289292ae4343841ffadc5ec16d4e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 1474:1477
   **Comment:**
        *Security: The table denylist branch raises 
`SupersetDisallowedSQLTableException` with the full configured denylist instead 
of only the tables actually present in the rendered SQL. This leaks internal 
security policy details to users through error messages and diverges from the 
SQL Lab behavior (which reports only matched tables). Compute the intersection 
of parsed table names and the denylist, and raise with that matched subset.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40567&comment_hash=ef864a33154d24c9d9eac2996e87b65604248ae59b41eb13217451a299c532c3&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40567&comment_hash=ef864a33154d24c9d9eac2996e87b65604248ae59b41eb13217451a299c532c3&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to