codeant-ai-for-open-source[bot] commented on code in PR #40665:
URL: https://github.com/apache/superset/pull/40665#discussion_r3338230631


##########
superset/databases/schemas.py:
##########
@@ -449,7 +449,21 @@ class DatabaseSSHTunnel(Schema):
     id = fields.Integer(
         allow_none=True, metadata={"description": "SSH Tunnel ID (for 
updates)"}
     )
-    server_address = fields.String()
+    # Restrict the SSH tunnel host to a plausible hostname / IP literal. This
+    # rejects values carrying URL structure, whitespace, or path separators —
+    # defense in depth against using the tunnel host as an SSRF vector.
+    server_address = fields.String(
+        validate=[
+            Length(min=1, max=256),
+            Regexp(
+                r"^[A-Za-z0-9._:\-\[\]]+$",
+                error=(
+                    "server_address must be a valid hostname or IP address "
+                    "(letters, digits, '.', '-', ':' only)"
+                ),
+            ),

Review Comment:
   **Suggestion:** The new hostname regex is too permissive and accepts 
non-host literals like `db.example.com:22` (or malformed bracket usage), even 
though `server_port` is provided separately. Those values pass schema 
validation but then fail later when the SSH tunnel tries DNS resolution on an 
invalid host string. Tighten validation so `:`/`[]` are only allowed for valid 
IPv6 forms, and reject host+port formats in `server_address`. [incorrect 
condition logic]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ SSH test_connection fails late for misformatted tunnel hosts.
   - ⚠️ Queries using SSH tunnels error until host corrected.
   - ⚠️ Error surface less clear; misconfig harder to diagnose.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Note in `superset/databases/schemas.py:448-466` that 
`DatabaseSSHTunnel.server_address`
   uses `Regexp(r"^[A-Za-z0-9._:\-\[\]]+$")` with `Length(1, 256)`, allowing 
arbitrary
   placement of `:`, `[` and `]` (e.g. `"bastion.example.com:22"`), while 
`server_port` is a
   separate `Integer` field.
   
   2. See that `DatabaseTestConnectionSchema` in 
`superset/databases/schemas.py:32-61` and
   `DatabasePostSchema`/`DatabasePutSchema` in 
`superset/databases/schemas.py:520-579` all
   declare `ssh_tunnel = fields.Nested(DatabaseSSHTunnel, allow_none=True)`, so 
this regex
   governs SSH tunnel host validation for both `/database` create/update and
   `test_connection` flows.
   
   3. In `superset/databases/api.py:1240-1319`, 
`DatabaseRestApi.test_connection` loads the
   JSON body via `DatabaseTestConnectionSchema().load(request.json)` and then 
calls
   `TestConnectionDatabaseCommand(item).run()`. A payload containing 
`"ssh_tunnel":
   {"server_address": "bastion.example.com:22", "server_port": 22, "username": 
"u",
   "password": "p"}` passes schema validation because 
`"bastion.example.com:22"` matches the
   current regex and `server_port` is a valid integer.
   
   4. `TestConnectionDatabaseCommand.run`
   (`superset/commands/database/test_connection.py:91-129`) builds a `Database` 
with
   `ssh_tunnel=SSHTunnel(**ssh_tunnel_properties)` via
   `DatabaseDAO.build_db_for_connection_test` 
(`superset/daos/database.py:94-107`); later,
   `Database.get_sqla_engine` (`superset/models/core.py:430-48`) calls
   `ssh_manager_factory.instance.create_tunnel(ssh_tunnel=self.ssh_tunnel,
   sqlalchemy_database_uri=sqlalchemy_uri)`, and `SSHManager.create_tunnel`
   (`superset/extensions/ssh.py:51-80`) passes
   `ssh_address_or_host=(ssh_tunnel.server_address, ssh_tunnel.server_port)` 
directly to
   `sshtunnel.open_tunnel`. With `server_address="bastion.example.com:22"`, the 
underlying
   library attempts DNS resolution on the invalid host string 
`"bastion.example.com:22"` and
   fails at tunnel-creation time, producing a connection/test error that could 
have been
   rejected earlier by stricter host validation that disallows host+port forms 
and malformed
   use of `:`/`[]`.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=92894f78bcb74320bf11f01d42826b8b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=92894f78bcb74320bf11f01d42826b8b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/databases/schemas.py
   **Line:** 458:464
   **Comment:**
        *Incorrect Condition Logic: The new hostname regex is too permissive 
and accepts non-host literals like `db.example.com:22` (or malformed bracket 
usage), even though `server_port` is provided separately. Those values pass 
schema validation but then fail later when the SSH tunnel tries DNS resolution 
on an invalid host string. Tighten validation so `:`/`[]` are only allowed for 
valid IPv6 forms, and reject host+port formats in `server_address`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40665&comment_hash=7af29aaa12597f9f8becfccd87aa58919a96dff6c95695b821486743865c9710&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40665&comment_hash=7af29aaa12597f9f8becfccd87aa58919a96dff6c95695b821486743865c9710&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to