codeant-ai-for-open-source[bot] commented on code in PR #35021:
URL: https://github.com/apache/superset/pull/35021#discussion_r2873242319


##########
superset-frontend/src/pages/RedirectWarning/utils.ts:
##########
@@ -0,0 +1,94 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+const TRUSTED_URLS_KEY = 'superset_trusted_urls';
+const MAX_TRUSTED_URLS = 100;
+const ALLOWED_SCHEMES = ['http:', 'https:'];
+
+/**
+ * Normalize a URL for comparison (origin + path without trailing slash + 
search).
+ */
+function normalizeUrl(url: string): string {
+  try {
+    const parsed = new URL(url);
+    return parsed.origin + parsed.pathname.replace(/\/$/, '') + parsed.search;
+  } catch {
+    return url;
+  }
+}
+
+/**
+ * Return true if the URL scheme is safe for navigation.
+ * Blocks javascript:, data:, vbscript:, file:, etc.
+ */
+export function isAllowedScheme(url: string): boolean {
+  try {
+    const parsed = new URL(url);
+    return ALLOWED_SCHEMES.includes(parsed.protocol);
+  } catch {
+    // relative URLs or unparseable — allow (they'll resolve against current 
origin)
+    return true;
+  }
+}
+
+/**
+ * Read the target URL from the current page's query string.
+ *
+ * URLSearchParams.get() already percent-decodes the value, so we must NOT
+ * call decodeURIComponent again (doing so would allow double-encoded
+ * payloads like `javascript%253Aalert(1)` to bypass scheme checks).
+ */
+export function getTargetUrl(): string {
+  const params = new URLSearchParams(window.location.search);
+  const url = params.get('url') ?? '';
+  return url.trim();
+}
+
+function getTrustedUrls(): string[] {
+  try {
+    const stored = localStorage.getItem(TRUSTED_URLS_KEY);
+    return stored ? JSON.parse(stored) : [];

Review Comment:
   **Suggestion:** The `getTrustedUrls` function assumes `JSON.parse` returns 
an array, but if localStorage has been tampered with (e.g., contains a string 
or object instead of an array), calling `.some()` on the result will throw a 
TypeError. Add validation to ensure the parsed value is actually an array 
before returning it. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ External link redirect page crashes preventing access to external URLs.
   - ⚠️ Users who tampered with localStorage (intentionally or via extension) 
blocked from email links.
   - ⚠️ Security feature becomes unusable for affected users.
   ```
   </details>
   
   ```suggestion
       const parsed = stored ? JSON.parse(stored) : [];
       return Array.isArray(parsed) ? parsed : [];
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. User manually modifies localStorage or uses a browser extension that sets
   `superset_trusted_urls` key to a non-array value (e.g., string `"evil"` or 
object
   `{"foo":"bar"}`).
   
   2. User receives an alert/report email containing an external link (e.g., `<a
   href="https://superset/redirect?url=https://external.com";>`).
   
   3. User clicks the link and navigates to `/redirect/` route (defined at
   `src/views/routes.tsx:199`), triggering the `RedirectWarning` component at
   `src/pages/RedirectWarning/index.tsx:98`.
   
   4. Component initializes and calls `isUrlTrusted(targetUrl)` at line 106 via 
useMemo hook.
   
   5. `isUrlTrusted()` at `utils.ts:82-85` calls `getTrustedUrls()` which 
returns the
   non-array value from localStorage.
   
   6. Code attempts to call `.some()` on the returned value at `utils.ts:84`, 
causing:
   `TypeError: parsed.some is not a function`, crashing the redirect warning 
page.
   
   7. User is unable to proceed to external URL or see the warning page, 
experiencing a
   blank/crashed page instead.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset-frontend/src/pages/RedirectWarning/utils.ts
   **Line:** 66:66
   **Comment:**
        *Possible Bug: The `getTrustedUrls` function assumes `JSON.parse` 
returns an array, but if localStorage has been tampered with (e.g., contains a 
string or object instead of an array), calling `.some()` on the result will 
throw a TypeError. Add validation to ensure the parsed value is actually an 
array before returning it.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F35021&comment_hash=43db453146f5ae4e6dc44616d08ebc8d58e17be5def0beae252fb02ced2273e4&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F35021&comment_hash=43db453146f5ae4e6dc44616d08ebc8d58e17be5def0beae252fb02ced2273e4&reaction=dislike'>👎</a>



##########
superset-frontend/src/pages/RedirectWarning/index.tsx:
##########
@@ -0,0 +1,183 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import { useState, useMemo, useCallback } from 'react';
+import { t } from '@superset-ui/core';
+import { css, styled, useTheme } from '@apache-superset/core/ui';
+import {
+  Button,
+  Card,
+  Checkbox,
+  Flex,
+  Typography,
+} from '@superset-ui/core/components';
+import { Icons } from '@superset-ui/core/components/Icons';
+import {
+  getTargetUrl,
+  isUrlTrusted,
+  trustUrl,
+  isAllowedScheme,
+} from './utils';
+
+const PageContainer = styled(Flex)`
+  ${({ theme }) => css`
+    height: calc(100vh - 64px);
+    background-color: ${theme.colorBgLayout};
+    padding: ${theme.padding}px;
+  `}
+`;
+
+const WarningCard = styled(Card)`
+  ${({ theme }) => css`
+    max-width: 520px;
+    width: 100%;
+    box-shadow: ${theme.boxShadowSecondary};
+  `}
+`;
+
+const WarningHeader = styled(Flex)`
+  ${({ theme }) => css`
+    padding: ${theme.paddingLG}px ${theme.paddingXL}px;
+    border-bottom: 1px solid ${theme.colorBorderSecondary};
+  `}
+`;
+
+const WarningBody = styled.div`
+  ${({ theme }) => css`
+    padding: ${theme.paddingXL}px;
+  `}
+`;
+
+const UrlDisplay = styled(Flex)`
+  ${({ theme }) => css`
+    background-color: ${theme.colorFillQuaternary};
+    border-radius: ${theme.borderRadiusSM}px;
+    padding: ${theme.paddingSM}px ${theme.padding}px;
+    margin-bottom: ${theme.margin}px;
+  `}
+`;
+
+const UrlText = styled(Typography.Text)`
+  ${({ theme }) => css`
+    font-family: ${theme.fontFamilyCode};
+    font-size: ${theme.fontSize}px;
+    word-break: break-all;
+  `}
+`;
+
+const WarningFooter = styled(Flex)`
+  ${({ theme }) => css`
+    padding: ${theme.padding}px ${theme.paddingXL}px;
+    background-color: ${theme.colorFillAlter};
+    border-top: 1px solid ${theme.colorBorderSecondary};
+  `}
+`;
+
+const WarningTitle = styled(Typography.Title)`
+  && {
+    margin: 0;
+  }
+`;
+
+export default function RedirectWarning() {
+  const theme = useTheme();
+  const [trustChecked, setTrustChecked] = useState(false);
+
+  const targetUrl = useMemo(() => getTargetUrl(), []);
+
+  // If already trusted, redirect immediately
+  const alreadyTrusted = useMemo(
+    () => Boolean(targetUrl && isUrlTrusted(targetUrl)),
+    [targetUrl],
+  );
+
+  if (alreadyTrusted && targetUrl) {
+    window.location.href = targetUrl;
+  }

Review Comment:
   **Suggestion:** Redirecting by assigning `window.location.href` during the 
render phase is a React anti-pattern that causes side effects during render. 
This should be wrapped in a `useEffect` hook. Additionally, the automatic 
redirect for trusted URLs should validate the URL scheme using 
`isAllowedScheme` to prevent potential open-redirect vulnerabilities if 
localStorage is manipulated to contain malicious URLs (e.g., `javascript:` 
schemes). [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ React anti-pattern: side effects during render cause double execution in 
Strict Mode.
   - ❌ Security bypass: trusted URLs skip `isAllowedScheme` validation enabling 
javascript: XSS.
   - ⚠️ Inconsistent validation between auto-redirect and manual continue flow.
   ```
   </details>
   
   ```suggestion
     // If already trusted, redirect immediately
     const alreadyTrusted = useMemo(
       () => Boolean(targetUrl && isUrlTrusted(targetUrl)),
       [targetUrl],
     );
   
     useEffect(() => {
       if (alreadyTrusted && targetUrl && isAllowedScheme(targetUrl)) {
         window.location.href = targetUrl;
       }
     }, [alreadyTrusted, targetUrl]);
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. User receives alert/report email containing external link and clicks it, 
navigating to
   `/redirect?url=https://example.com` (endpoint implemented by this new page 
component).
   
   2. If user previously trusted the domain via the "Trust this URL" checkbox,
   `isUrlTrusted(targetUrl)` at line 106 returns `true` from localStorage.
   
   3. During component render phase (before useEffect), the condition `if 
(alreadyTrusted &&
   targetUrl)` at line 110 evaluates to true, triggering immediate execution of
   `window.location.href = targetUrl` - a side effect during React's render 
phase.
   
   4. If localStorage was manipulated to mark a malicious URL like
   `javascript:alert(document.cookie)` as trusted, the auto-redirect executes 
without
   `isAllowedScheme` validation (unlike the manual "Continue" button which 
validates at line
   115), leading to XSS vulnerability.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset-frontend/src/pages/RedirectWarning/index.tsx
   **Line:** 104:112
   **Comment:**
        *Possible Bug: Redirecting by assigning `window.location.href` during 
the render phase is a React anti-pattern that causes side effects during 
render. This should be wrapped in a `useEffect` hook. Additionally, the 
automatic redirect for trusted URLs should validate the URL scheme using 
`isAllowedScheme` to prevent potential open-redirect vulnerabilities if 
localStorage is manipulated to contain malicious URLs (e.g., `javascript:` 
schemes).
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F35021&comment_hash=8042bac7af4bafc5241885cb49a4c3e9cb2473c6c425ed02ea4a60a314c900ef&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F35021&comment_hash=8042bac7af4bafc5241885cb49a4c3e9cb2473c6c425ed02ea4a60a314c900ef&reaction=dislike'>👎</a>



##########
superset/utils/link_redirect.py:
##########
@@ -0,0 +1,149 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Utilities for processing links in alert/report emails.
+
+External links are rewritten to go through a redirect warning page so that
+recipients see a confirmation before navigating to an external site.
+"""
+
+import logging
+import re
+from urllib.parse import quote, urlparse
+
+from flask import current_app
+
+logger = logging.getLogger(__name__)
+
+# Matches href="..." in anchor tags (both single and double quotes)
+_HREF_RE = re.compile(
+    r"""(<a\s[^>]*?href\s*=\s*)(["'])(.*?)\2""",
+    re.IGNORECASE | re.DOTALL,
+)
+
+
+def _get_base_hosts() -> set[str]:
+    """Return the set of hosts that are considered internal (lower-cased)."""
+    hosts: set[str] = set()
+    for key in ("WEBDRIVER_BASEURL_USER_FRIENDLY", "WEBDRIVER_BASEURL"):
+        url = current_app.config.get(key, "")
+        if url:
+            parsed = urlparse(url)
+            if parsed.scheme and parsed.netloc:
+                hosts.add(parsed.netloc.lower())
+    return hosts
+
+
+def _get_redirect_base() -> str:
+    """Return the base URL used to build redirect links."""
+    for key in ("WEBDRIVER_BASEURL_USER_FRIENDLY", "WEBDRIVER_BASEURL"):
+        url = current_app.config.get(key, "")
+        if url:
+            return url.rstrip("/")
+    return ""
+
+
+def _is_external(href: str, base_hosts: set[str]) -> bool:
+    """Return True if *href* points to an external host."""
+    parsed = urlparse(href)
+    # Only rewrite http(s) links with a host that differs from ours
+    if parsed.scheme not in ("http", "https"):
+        return False
+    return bool(parsed.netloc) and parsed.netloc.lower() not in base_hosts
+
+
+def _replace_href(
+    match: re.Match[str],
+    base_hosts: set[str],
+    redirect_base: str,
+) -> str:
+    """Regex replacer: rewrite external hrefs to go through the redirect 
page."""
+    prefix, quote_char, href = match.group(1), match.group(2), match.group(3)
+    href = href.strip()
+
+    # Don't double-redirect
+    if "/redirect/" in href:
+        return match.group(0)
+
+    if not _is_external(href, base_hosts):
+        return match.group(0)
+
+    redirect_url = f"{redirect_base}/redirect/?url={quote(href, safe='')}"
+    return f"{prefix}{quote_char}{redirect_url}{quote_char}"
+
+
+def process_html_links(html_content: str) -> str:
+    """
+    Rewrite external links in *html_content* to go through the redirect page.
+
+    Internal links (matching the configured base URL hosts) are left untouched.
+    """
+    if not html_content or not html_content.strip():
+        return html_content
+
+    if not current_app.config.get("ALERT_REPORTS_ENABLE_LINK_REDIRECT", True):
+        return html_content
+
+    base_hosts = _get_base_hosts()
+    if not base_hosts:
+        logger.warning("No base URL configured, skipping link redirect 
processing")
+        return html_content
+
+    redirect_base = _get_redirect_base()
+    if not redirect_base:
+        return html_content
+
+    try:
+        return _HREF_RE.sub(
+            lambda m: _replace_href(m, base_hosts, redirect_base),
+            html_content,
+        )
+    except Exception:
+        logger.warning("Failed to process HTML links", exc_info=True)
+        return html_content

Review Comment:
   **Suggestion:** The process_html_links() function has fail-open behavior on 
exceptions - when the regex substitution raises an exception, it logs a warning 
and returns the original html_content unchanged. For a security feature 
designed to protect users from malicious external links, fail-open behavior is 
dangerous because a parsing error or edge case in the regex could allow 
malicious links to pass through unmodified. A security control should 
fail-closed. The function should return an empty string or sanitized content 
when processing fails, not the potentially malicious original content. 
Alternatively, re-raise the exception after logging to ensure the error is not 
silently ignored. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Malicious links bypass redirect warning on regex failures.
   - ❌ Email security control fails open exposing users to phishing.
   - ⚠️ Alert/report notification system compromised on edge case inputs.
   ```
   </details>
   
   ```suggestion
       try:
           return _HREF_RE.sub(
               lambda m: _replace_href(m, base_hosts, redirect_base),
               html_content,
           )
       except Exception:
           logger.exception("Failed to process HTML links")
           # Fail closed: return empty content instead of potentially malicious 
HTML
           return ""
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Alerts/reports system calls `process_html_links()` at
   `superset/utils/link_redirect.py:90` with HTML content containing malicious 
links.
   
   2. Regex substitution at line 112-114 encounters edge case input (e.g., 
malformed HTML
   with nested quotes, extremely long href values triggering recursion limits, 
or exotic
   Unicode causing regex backtracking issues).
   
   3. Exception occurs during `_HREF_RE.sub()` execution, triggering the bare 
`except
   Exception:` handler at line 116.
   
   4. Function returns original `html_content` at line 118 with malicious links 
intact,
   bypassing the security redirect mechanism.
   
   5. Email containing unmodified malicious links is sent to users without any 
warning or
   interstitial page protection.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/link_redirect.py
   **Line:** 111:118
   **Comment:**
        *Security: The process_html_links() function has fail-open behavior on 
exceptions - when the regex substitution raises an exception, it logs a warning 
and returns the original html_content unchanged. For a security feature 
designed to protect users from malicious external links, fail-open behavior is 
dangerous because a parsing error or edge case in the regex could allow 
malicious links to pass through unmodified. A security control should 
fail-closed. The function should return an empty string or sanitized content 
when processing fails, not the potentially malicious original content. 
Alternatively, re-raise the exception after logging to ensure the error is not 
silently ignored.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F35021&comment_hash=a5f8d070e6e50c0ed19e39bf5778a4254af2b45077bd0ced7bfcc12652e73b64&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F35021&comment_hash=a5f8d070e6e50c0ed19e39bf5778a4254af2b45077bd0ced7bfcc12652e73b64&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to