codeant-ai-for-open-source[bot] commented on code in PR #40673:
URL: https://github.com/apache/superset/pull/40673#discussion_r3338722375


##########
superset/extensions/ssh.py:
##########
@@ -60,6 +152,10 @@ def create_tunnel(
         port = url.port or get_default_port(backend)
         if not port:
             raise SSHTunnelDatabasePortError()
+
+        # Opt-in host-key verification runs before the tunnel is opened.
+        self._verify_host_key(ssh_tunnel)

Review Comment:
   **Suggestion:** The new pre-check call introduces 
`SSHTunnelHostKeyVerificationError` into the tunnel creation path, but existing 
create/update/test-connection handlers are not wired to treat this new 
exception like other SSH tunnel validation failures. In practice this gets 
surfaced as a generic unexpected/connection error instead of a deterministic 
SSH host-key validation response. Update the command/API exception handling 
paths to explicitly propagate this new exception with the intended 
status/message. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Database test connection API hides host-key verification failures.
   - ❌ Database create fails with generic connection error on host-key 
mismatches.
   - ⚠️ Debugging SSH host key config becomes harder for operators.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure an SSH tunnel with an invalid or mismatched host key, for 
example
   `server_host_key="not-a-valid-key"` as in the unit test at
   `tests/unit_tests/extensions/ssh_test.py:63-69`, or by setting a different 
key than the
   SSH server actually presents. Ensure the tunnel is attached to a `Database` 
model via
   `Database.ssh_tunnel` (referenced in `superset/models/core.py:14-23`).
   
   2. Trigger a connection test through the API endpoint implemented in
   `superset/databases/api.py:16-24` by calling `POST 
/api/v1/database/test_connection` with
   JSON matching `DatabaseTestConnectionSchema`, including the `ssh_tunnel` 
payload so that
   `TestConnectionDatabaseCommand(item).run()` is invoked.
   
   3. Inside `TestConnectionDatabaseCommand.run`
   (`superset/commands/database/test_connection.py:91-143`), the command builds 
a `Database`
   via `DatabaseDAO.build_db_for_connection_test(..., 
ssh_tunnel=ssh_tunnel_properties)` at
   `test_connection.py:123-129`, then calls `with database.get_sqla_engine() as 
engine:` at
   `test_connection.py:142`. `Database.get_sqla_engine` 
(`superset/models/core.py:14-25`)
   calls 
`ssh_manager_factory.instance.create_tunnel(ssh_tunnel=self.ssh_tunnel, ...)` at
   `core.py:17-20`.
   
   4. `SSHManager.create_tunnel` (`superset/extensions/ssh.py:143-159`) calls
   `self._verify_host_key(ssh_tunnel)` at `ssh.py:157`. In `_verify_host_key`
   (`ssh.py:81-141`), a bad or missing expected key results in
   `SSHTunnelHostKeyVerificationError` being raised (see 
`ssh_tunnel/exceptions.py:80-83`).
   This exception is not explicitly handled in 
`TestConnectionDatabaseCommand.run`: it falls
   through to the broad `except Exception as ex:` at 
`test_connection.py:235-252`, which logs
   and wraps it into `DatabaseTestConnectionUnexpectedError`
   (`commands/database/exceptions.py:175-177`). The test-connection API method 
in
   `superset/databases/api.py:16-25` only catches `(SSHTunnelingNotEnabledError,
   SSHTunnelDatabasePortError)` at `api.py:24-25` and has no generic 
`SupersetException`
   handler, so `DatabaseTestConnectionUnexpectedError` bubbles up as a generic 
unexpected
   error/500 instead of a deterministic `SSHTunnelHostKeyVerificationError` 
message. Similar
   behavior occurs in create/update flows: `CreateDatabaseCommand.run` at
   `commands/database/create.py:7-25` uses `TestConnectionDatabaseCommand` and 
re-raises the
   wrapped unexpected error, and the database API create handler at 
`databases/api.py:1-40`
   only treats `(SSHTunnelingNotEnabledError, SSHTunnelDatabasePortError)` 
specially, leaving
   host-key verification failures to be surfaced as generic connection or 
unexpected errors.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=0cb36f9ffa2347cc98966f143234fd0d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=0cb36f9ffa2347cc98966f143234fd0d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/extensions/ssh.py
   **Line:** 157:157
   **Comment:**
        *Api Mismatch: The new pre-check call introduces 
`SSHTunnelHostKeyVerificationError` into the tunnel creation path, but existing 
create/update/test-connection handlers are not wired to treat this new 
exception like other SSH tunnel validation failures. In practice this gets 
surfaced as a generic unexpected/connection error instead of a deterministic 
SSH host-key validation response. Update the command/API exception handling 
paths to explicitly propagate this new exception with the intended 
status/message.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40673&comment_hash=7a083af34dc8cc71b89b685c17e44d997309b0227425c9aa44789852a7d1c892&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40673&comment_hash=7a083af34dc8cc71b89b685c17e44d997309b0227425c9aa44789852a7d1c892&reaction=dislike'>👎</a>



##########
superset/extensions/ssh.py:
##########
@@ -48,6 +78,68 @@ def build_sqla_url(
             port=server.local_bind_port,
         )
 
+    def _verify_host_key(self, ssh_tunnel: "SSHTunnel") -> None:
+        """
+        Opt-in defense-in-depth: verify the SSH server's host key before 
opening the
+        tunnel, to resist man-in-the-middle attacks (paramiko's ``Transport`` 
does no
+        known-hosts checking by default).
+
+        Behavior:
+
+        - If the tunnel declares an expected ``server_host_key``, connect to 
the SSH
+          server, read the host key it presents, and compare. On mismatch (or 
if the
+          expected key cannot be parsed) raise
+          :class:`SSHTunnelHostKeyVerificationError`.
+        - If no expected key is set and 
``SSH_TUNNEL_STRICT_HOST_KEY_CHECKING`` is
+          enabled, fail closed and raise.
+        - If no expected key is set and strict checking is disabled, do 
nothing,
+          preserving existing (unverified) behavior.
+        """
+        expected_raw = ssh_tunnel.server_host_key
+
+        if not expected_raw or not expected_raw.strip():
+            if self.strict_host_key_checking:
+                raise SSHTunnelHostKeyVerificationError(
+                    message=(
+                        "SSH_TUNNEL_STRICT_HOST_KEY_CHECKING is enabled but no 
"
+                        "expected server host key is configured for this 
tunnel."
+                    )
+                )
+            return
+
+        try:
+            expected_key = _parse_authorized_key(expected_raw)
+        except ValueError as ex:
+            raise SSHTunnelHostKeyVerificationError(
+                message=f"The configured expected server host key is invalid: 
{ex}"
+            ) from ex
+
+        transport = paramiko.Transport(
+            (ssh_tunnel.server_address, ssh_tunnel.server_port)
+        )

Review Comment:
   **Suggestion:** `paramiko.Transport((host, port))` opens the TCP connection 
before `start_client(timeout=...)`, so this path does not apply an explicit 
connect timeout and can block for OS-level TCP timeout intervals on unreachable 
hosts. That can stall requests when host-key verification is enabled. Use an 
explicitly timeout-bound socket (or equivalent timeout-aware connect path) 
before creating `Transport`. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Database test connection requests can hang for long durations.
   - ⚠️ Regular query execution over SSH may experience unexpected delays.
   - ⚠️ SSH tunnel host-key path ignores configured socket timeout.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Enable host-key verification by configuring an SSH tunnel with a non-empty
   `server_host_key` field (stored on the `SSHTunnel` model referenced via 
TYPE_CHECKING in
   `superset/extensions/ssh.py:36-37`) and associate it with a `Database` so 
that
   `Database.ssh_tunnel` is non-null (`superset/models/core.py:14-23`).
   
   2. Point the tunnel's `server_address`/`server_port` to an SSH host that is 
unreachable or
   silently drops TCP SYN packets (for example, a firewalled IP), so that TCP 
connect
   attempts do not fail quickly.
   
   3. Invoke a code path that opens a SQLAlchemy engine using this database, 
such as the
   test-connection API `POST /api/v1/database/test_connection` implemented in
   `superset/databases/api.py:16-24`, which calls
   `TestConnectionDatabaseCommand(item).run()`. Inside `run`
   (`superset/commands/database/test_connection.py:91-143`), the command builds 
a `Database`
   with `ssh_tunnel` and then calls `with database.get_sqla_engine() as 
engine:` at
   `test_connection.py:142`, which in turn calls
   `ssh_manager_factory.instance.create_tunnel(...)` as shown in
   `superset/models/core.py:16-23`.
   
   4. `SSHManager.create_tunnel` in `superset/extensions/ssh.py:143-159` calls
   `self._verify_host_key(ssh_tunnel)` at `ssh.py:157`. `_verify_host_key` 
(`ssh.py:81-141`)
   constructs a `paramiko.Transport((ssh_tunnel.server_address, 
ssh_tunnel.server_port))` at
   `ssh.py:117-119` and only then calls
   `transport.start_client(timeout=sshtunnel.SSH_TIMEOUT)` at `ssh.py:121`. 
Because the
   `Transport` constructor performs the underlying TCP connect when given a 
`(host, port)`
   tuple, and no explicit `socket.settimeout` or connect timeout is applied 
here, this
   connect can block until the OS-level TCP timeout for unreachable hosts. The 
Superset
   config at `superset/config.py:5-10` defines `SSH_TUNNEL_PACKET_TIMEOUT_SEC` 
as "Timeout
   (seconds) for transport socket (socket.settimeout)" and uses it to set
   `sshtunnel.SSH_TIMEOUT` in `SSHManager.__init__` (`ssh.py:63-69`), but that 
timeout only
   constrains `start_client`, not the initial TCP connect in `Transport((host, 
port))`, so
   requests that hit this path can stall significantly longer than the 
configured SSH tunnel
   packet timeout when host-key verification is enabled.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1e9e2f6bf33f49cfa95f3915fa380089&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1e9e2f6bf33f49cfa95f3915fa380089&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/extensions/ssh.py
   **Line:** 117:119
   **Comment:**
        *Performance: `paramiko.Transport((host, port))` opens the TCP 
connection before `start_client(timeout=...)`, so this path does not apply an 
explicit connect timeout and can block for OS-level TCP timeout intervals on 
unreachable hosts. That can stall requests when host-key verification is 
enabled. Use an explicitly timeout-bound socket (or equivalent timeout-aware 
connect path) before creating `Transport`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40673&comment_hash=b95f0893bc499fe0da02fe8dd9ee8ad7f046f2e7b1eb2e613bf8a774d9cd3823&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40673&comment_hash=b95f0893bc499fe0da02fe8dd9ee8ad7f046f2e7b1eb2e613bf8a774d9cd3823&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to