codeant-ai-for-open-source[bot] commented on code in PR #40640:
URL: https://github.com/apache/superset/pull/40640#discussion_r3337861732


##########
superset/utils/schema.py:
##########
@@ -51,3 +54,26 @@ def validate_json(value: Union[bytes, bytearray, str]) -> 
None:
         json.validate_json(value)
     except json.JSONDecodeError as ex:
         raise ValidationError("JSON not valid") from ex
+
+
+def validate_external_url(value: Optional[str]) -> None:
+    """
+    Validator for externally managed object URLs.
+
+    Restricts the accepted URL schemes to ``http`` and ``https`` so that
+    other schemes (for example ``javascript:``, ``data:`` or ``vbscript:``)
+    cannot be stored and later rendered by clients. Empty values are allowed
+    since the field is optional.
+
+    :param value: the URL to validate
+    :raises ValidationError: if the value uses a disallowed scheme
+    """
+    if not value:
+        return
+
+    scheme = urlparse(value).scheme.lower()
+    if scheme not in ALLOWED_URL_SCHEMES:
+        raise ValidationError(
+            "URL must use one of the following schemes: "
+            f"{', '.join(sorted(ALLOWED_URL_SCHEMES))}."

Review Comment:
   **Suggestion:** The validator checks only the scheme and does not enforce an 
absolute URL structure, so malformed values like `https:foo` (no host) are 
accepted even though callers expect externally managed links to be absolute 
HTTP(S) URLs. Validate both scheme and URL authority (eg, non-empty `netloc`) 
to prevent storing invalid links that break downstream consumers. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Charts may store invalid external links like `https:foo`.
   - ⚠️ Dashboards can expose malformed external URLs to clients.
   - ⚠️ Frontend navigation buttons may break for such resources.
   - ⚠️ Import flows (`ImportV1ChartSchema`) accept malformed URLs.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Use any API path that persists an externally managed chart or dashboard 
with an
   `external_url` field validated by `validate_external_url`, such as `POST 
/api/v1/chart/`
   or `PUT /api/v1/chart/<id>` which use `ChartPostSchema` and `ChartPutSchema` 
respectively
   (`superset/charts/schemas.py:34-91`), or `POST /api/v1/dashboard/` which uses
   `BaseDashboardSchema` with `external_url = fields.String(...,
   validate=validate_external_url)` at `superset/dashboards/schemas.py:11-13`.
   
   2. In the JSON payload, set `is_managed_externally` to `true` and provide a 
malformed but
   scheme-prefixed URL like `"https:foo"` or `"http:bar"` as `external_url`; 
when the schema
   runs, `validate_external_url` at `superset/utils/schema.py:59-79` is invoked 
because the
   field's `validate` argument points to this function (see
   `superset/charts/schemas.py:29-31` and `:87-88`, and
   `superset/dashboards/schemas.py:11-13`).
   
   3. Inside `validate_external_url`, `urlparse("https:foo")` yields 
`scheme="https"` and
   `netloc=""` but the validator only checks that `scheme` is in 
`ALLOWED_URL_SCHEMES` and
   does not validate `netloc`; as a result, the malformed `"https:foo"` value 
passes
   validation and is stored as-is on the chart or dashboard record.
   
   4. Later, when clients or the frontend read these objects via the standard 
chart or
   dashboard APIs (e.g. `/api/v1/chart/<id>` or `/api/v1/dashboard/<id>` which 
serialize
   `external_url` using the same schemas), they receive an `external_url` that 
is not an
   absolute HTTP(S) URL, causing broken links or navigation failures wherever 
the UI expects
   fully qualified external links.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8e35d749670b4b12b15cea4f388d3b91&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8e35d749670b4b12b15cea4f388d3b91&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/schema.py
   **Line:** 74:78
   **Comment:**
        *Logic Error: The validator checks only the scheme and does not enforce 
an absolute URL structure, so malformed values like `https:foo` (no host) are 
accepted even though callers expect externally managed links to be absolute 
HTTP(S) URLs. Validate both scheme and URL authority (eg, non-empty `netloc`) 
to prevent storing invalid links that break downstream consumers.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40640&comment_hash=44bb81a400130a7d1de40315cc8d2aeca5b86e68a134203f970201c497fc6e08&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40640&comment_hash=44bb81a400130a7d1de40315cc8d2aeca5b86e68a134203f970201c497fc6e08&reaction=dislike'>👎</a>



##########
superset/charts/schemas.py:
##########
@@ -62,6 +62,34 @@ def get_time_grain_choices() -> Any:
     ]
 
 
+# Fallback upper bound for the number of Prophet forecast periods when the
+# application config cannot be read (for example, outside of an app context).
+DEFAULT_MAX_PROPHET_PERIODS = 10000
+
+
+def get_max_prophet_periods() -> int:
+    """Get the configured upper bound for Prophet forecast periods."""
+    try:
+        return current_app.config.get(
+            "MAX_PROPHET_PERIODS", DEFAULT_MAX_PROPHET_PERIODS
+        )

Review Comment:
   **Suggestion:** `MAX_PROPHET_PERIODS` is read directly from config without 
type normalization, so a common override like `os.getenv(...)` (string) will 
make `Range(max=...)` compare `int` to `str` and raise a runtime `TypeError` 
instead of a validation error. Cast the config value to `int` (and handle 
invalid/negative values) before passing it to `Range`. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Prophet post-processing requests can crash on mis-typed config.
   - ⚠️ Chart data API `/api/v1/chart/data` returns 500 on validation.
   - ⚠️ MCP chart tools using `ChartDataQueryContextSchema` also break.
   - ⚠️ Misconfigured `MAX_PROPHET_PERIODS` hard-fails all Prophet charts.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure Superset to override the Prophet periods bound by setting
   `MAX_PROPHET_PERIODS` to a string in `superset_config.py` (loaded into
   `current_app.config`), for example at `superset/config.py:1322` default is
   `MAX_PROPHET_PERIODS = 10000` but in your override file use 
`MAX_PROPHET_PERIODS =
   "10000"` or `MAX_PROPHET_PERIODS = os.getenv("MAX_PROPHET_PERIODS", 
"10000")` without
   casting to int.
   
   2. Start the Superset web application so that 
`current_app.config["MAX_PROPHET_PERIODS"]`
   now holds a `str` instead of an `int`, and hit the chart data API endpoint 
that uses
   Prophet post-processing (`POST /api/v1/chart/data` implemented in
   `superset/charts/data/api.py`, which imports `ChartDataQueryContextSchema` 
at line 39 and
   calls `ChartDataQueryContextSchema().load(form_data)` at
   `superset/charts/data/api.py:664`).
   
   3. In the request body, include a `queries[0].post_processing` entry with an 
operation of
   type `"prophet"` whose `options` include a valid integer `periods` (for 
example
   `{"operation": "prophet", "options": {"time_grain": "P1D", "periods": 7,
   "confidence_interval": 0.8}}`), so that Marshmallow instantiates and 
validates
   `ChartDataProphetOptionsSchema` defined at 
`superset/charts/schemas.py:48-72`.
   
   4. During validation, `validate_prophet_periods` at 
`superset/charts/schemas.py:42-51`
   calls `Range(min=1, max=get_max_prophet_periods(), ...)` where 
`get_max_prophet_periods()`
   returns the string `"10000"` from `current_app.config` (lines 31-36); 
Marshmallow's
   `Range.__call__` then compares the integer `value` (e.g. 7) to the string 
`max` using
   `<`/`>`, raising a `TypeError` instead of a `ValidationError`, which 
propagates out of
   `ChartDataQueryContextSchema().load()` (no catch for `TypeError` in
   `superset/charts/data/api.py:650-689) and results in a 500-style internal 
error for the
   chart data request rather than a clean validation message.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8544194020d4465d8b3c9a6cdc173125&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8544194020d4465d8b3c9a6cdc173125&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/charts/schemas.py
   **Line:** 73:75
   **Comment:**
        *Type Error: `MAX_PROPHET_PERIODS` is read directly from config without 
type normalization, so a common override like `os.getenv(...)` (string) will 
make `Range(max=...)` compare `int` to `str` and raise a runtime `TypeError` 
instead of a validation error. Cast the config value to `int` (and handle 
invalid/negative values) before passing it to `Range`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40640&comment_hash=91ba1d60c0796e90e0c466747b2fb50154f1098084fc312e33d8816e61b7b80c&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40640&comment_hash=91ba1d60c0796e90e0c466747b2fb50154f1098084fc312e33d8816e61b7b80c&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to