codeant-ai-for-open-source[bot] commented on code in PR #39301:
URL: https://github.com/apache/superset/pull/39301#discussion_r3408323739


##########
superset/db_engine_specs/impala.py:
##########
@@ -209,8 +210,25 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         """
         try:
             impala_host = query.database.url_object.host
+            # The cancel call issues an outbound HTTP request from the
+            # Superset backend to whatever host the DB connection was
+            # configured with; validate it before the call to keep this
+            # path consistent with the dataset-import and webhook URL
+            # checks. Operators with internal Impala targets can opt out
+            # via IMPALA_CANCEL_QUERY_ALLOW_INTERNAL_HOSTS.
+            if not impala_host:
+                return False
+            if not app.config[
+                "IMPALA_CANCEL_QUERY_ALLOW_INTERNAL_HOSTS"
+            ] and not is_safe_host(impala_host):
+                logger.warning(
+                    "Impala cancel_query refused: target host is not allowed"
+                )
+                return False
             url = 
f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}";
-            response = requests.post(url, timeout=3)
+            # Do not follow redirects: a validated host could otherwise 30x the
+            # request to an internal target, bypassing the is_safe_host check.

Review Comment:
   **Suggestion:** This host validation is vulnerable to DNS rebinding because 
`is_safe_host()` resolves DNS first, but `requests.post()` performs a fresh DNS 
resolution when sending the request. An attacker controlling DNS for a hostname 
can return a public IP during validation and then switch to an internal IP 
before the HTTP call, bypassing the SSRF guard. Resolve once and pin the 
outbound connection to the validated IP (or use a transport that enforces 
resolved-address pinning) instead of validating and then reconnecting by 
hostname. [ssrf]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Impala cancel_query SSRF guard bypassable via DNS rebinding.
   - ⚠️ Backend can issue HTTP requests to internal network hosts.
   - ⚠️ Private services exposed whenever Impala hostname uses attacker DNS.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. A client calls the queries API stop endpoint (`@expose("/stop", 
methods=("POST",))`)
   implemented in `superset/queries/api.py:232-271`, which deserializes the 
JSON body and
   invokes `QueryDAO.stop_query(body["client_id"])` at 
`superset/queries/api.py:10-13`.
   
   2. `QueryDAO.stop_query` (in `superset/daos/query.py:60-24`) loads the 
`Query` row for the
   given `client_id` and, if the query is still running, calls 
`sql_lab.cancel_query(query)`
   at `superset/daos/query.py:20-21`.
   
   3. `sql_lab.cancel_query` (in `superset/sql_lab.py:5-39`) checks for an 
existing cancel
   key in `query.extra` and then creates a new SQLAlchemy engine and cursor with
   `query.database.get_sqla_engine(...)`, finally delegating to
   `query.database.db_engine_spec.cancel_query(cursor, query, cancel_query_id)` 
at
   `superset/sql_lab.py:32-39`. For an Impala database, this calls
   `ImpalaEngineSpec.cancel_query` implemented in 
`superset/db_engine_specs/impala.py:23-56`.
   
   4. Inside `ImpalaEngineSpec.cancel_query` 
(`superset/db_engine_specs/impala.py:33-48`),
   the code reads `impala_host = query.database.url_object.host` and applies 
the host check:
   `if not app.config["IMPALA_CANCEL_QUERY_ALLOW_INTERNAL_HOSTS"] and not
   is_safe_host(impala_host): ...`. `is_safe_host` (defined in
   `superset/utils/network.py:8-34`) calls `socket.getaddrinfo(host, None)` at 
line 18 and
   verifies each returned IP is globally routable; if an attacker-controlled 
DNS server for
   `impala_host` returns a public IP at this point, `is_safe_host` returns 
`True` and the
   guard passes.
   
   5. After validation, `cancel_query` constructs `url =
   f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}"` and 
issues
   `requests.post(url, timeout=3, allow_redirects=False)` at
   `superset/db_engine_specs/impala.py:49-52`. The `requests` library performs 
its own DNS
   resolution for `impala_host` when opening the HTTP connection rather than 
reusing the
   earlier `socket.getaddrinfo` result. An attacker who controls DNS for 
`impala_host` can
   perform DNS rebinding: first returning a public IP during `is_safe_host` so 
the check
   passes, then quickly changing the DNS record so that the subsequent 
resolution performed
   by `requests.post` points to a private or loopback address. This causes the 
Superset
   backend to send the HTTP cancel request to an internal host despite the SSRF 
guard,
   because validation and connection use separate, unpinned DNS lookups and
   `allow_redirects=False` only blocks redirect-based bypasses, not this 
rebinding window.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=889b60e7814241a08d48526f4cead6f8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=889b60e7814241a08d48526f4cead6f8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/impala.py
   **Line:** 221:230
   **Comment:**
        *Ssrf: This host validation is vulnerable to DNS rebinding because 
`is_safe_host()` resolves DNS first, but `requests.post()` performs a fresh DNS 
resolution when sending the request. An attacker controlling DNS for a hostname 
can return a public IP during validation and then switch to an internal IP 
before the HTTP call, bypassing the SSRF guard. Resolve once and pin the 
outbound connection to the validated IP (or use a transport that enforces 
resolved-address pinning) instead of validating and then reconnecting by 
hostname.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39301&comment_hash=ccb9a5ba4b573dc5904f5cdc2bf41f4cfdf87c17421a24d26715ea4203efe371&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39301&comment_hash=ccb9a5ba4b573dc5904f5cdc2bf41f4cfdf87c17421a24d26715ea4203efe371&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to