codeant-ai-for-open-source[bot] commented on code in PR #38403:
URL: https://github.com/apache/superset/pull/38403#discussion_r2943415328


##########
superset/mcp_service/chart/schemas.py:
##########
@@ -590,6 +590,111 @@ class MixedTimeseriesChartConfig(BaseModel):
     filters: List[FilterConfig] | None = None
 
 
+class BigNumberChartConfig(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+
+    chart_type: Literal["big_number"] = Field(
+        ...,
+        description=(
+            "Chart type discriminator - MUST be 'big_number'. "
+            "Creates Big Number charts that display a single prominent "
+            "metric value. Set show_trendline=True with a temporal_column "
+            "for a number with trendline, or leave show_trendline=False "
+            "for a standalone number."
+        ),
+    )
+    metric: ColumnRef = Field(
+        ...,
+        description=(
+            "The metric to display as a big number. "
+            "Must include an aggregate function (e.g., SUM, COUNT)."
+        ),
+    )
+    temporal_column: str | None = Field(
+        None,
+        description=(
+            "Temporal column for the trendline x-axis. "
+            "Required when show_trendline is True."
+        ),
+        min_length=1,
+        max_length=255,
+        pattern=r"^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$",
+    )

Review Comment:
   **Suggestion:** `temporal_column` is accepted as raw user input and passed 
through to `granularity_sqla` without the same sanitization used for other 
column identifiers. This can allow SQL-keyword payloads or malformed values 
that break query generation or create an injection surface; sanitize it with 
`sanitize_user_input(..., check_sql_keywords=True)` in a field validator. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Big Number trendline requests can fail at compile stage.
   - ⚠️ Temporal identifiers lack consistent SQL-keyword sanitization path.
   ```
   </details>
   
   ```suggestion
       temporal_column: str | None = Field(
           None,
           description=(
               "Temporal column for the trendline x-axis. "
               "Required when show_trendline is True."
           ),
           min_length=1,
           max_length=255,
           pattern=r"^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$",
       )
   
       @field_validator("temporal_column")
       @classmethod
       def sanitize_temporal_column(cls, v: str | None) -> str | None:
           """Sanitize temporal column name to prevent injection attacks."""
           return sanitize_user_input(
               v,
               "Temporal column",
               max_length=255,
               allow_empty=True,
               check_sql_keywords=True,
           )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call MCP `generate_chart` 
(`superset/mcp_service/chart/tool/generate_chart.py:124`)
   with `config.chart_type='big_number'`, `show_trendline=true`, and
   `temporal_column='SELECT'` (or another keyword-looking identifier).
   
   2. Request parsing succeeds because `BigNumberChartConfig.temporal_column` in
   `superset/mcp_service/chart/schemas.py:613-622` only applies regex/length 
checks and has
   no sanitizer validator.
   
   3. The value is copied directly into query form_data at
   `superset/mcp_service/chart/chart_utils.py:674` 
(`form_data["granularity_sqla"] =
   config.temporal_column`).
   
   4. Dataset validation does not enforce Big Number column existence/types
   (`superset/mcp_service/chart/validation/dataset_validator.py:56-57`, 
`178-197`), so this
   reaches compile/query execution and can fail generation with 
`CHART_COMPILE_FAILED` in
   `generate_chart.py:296-336`.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/schemas.py
   **Line:** 613:622
   **Comment:**
        *Security: `temporal_column` is accepted as raw user input and passed 
through to `granularity_sqla` without the same sanitization used for other 
column identifiers. This can allow SQL-keyword payloads or malformed values 
that break query generation or create an injection surface; sanitize it with 
`sanitize_user_input(..., check_sql_keywords=True)` in a field validator.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38403&comment_hash=e84c3d426a6358489f77be1cdcfbe96b9ec1ea21aede831d4c3a090755525231&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38403&comment_hash=e84c3d426a6358489f77be1cdcfbe96b9ec1ea21aede831d4c3a090755525231&reaction=dislike'>👎</a>



##########
superset/mcp_service/chart/schemas.py:
##########
@@ -590,6 +590,111 @@ class MixedTimeseriesChartConfig(BaseModel):
     filters: List[FilterConfig] | None = None
 
 
+class BigNumberChartConfig(BaseModel):
+    model_config = ConfigDict(extra="forbid")
+
+    chart_type: Literal["big_number"] = Field(
+        ...,
+        description=(
+            "Chart type discriminator - MUST be 'big_number'. "
+            "Creates Big Number charts that display a single prominent "
+            "metric value. Set show_trendline=True with a temporal_column "
+            "for a number with trendline, or leave show_trendline=False "
+            "for a standalone number."
+        ),
+    )
+    metric: ColumnRef = Field(
+        ...,
+        description=(
+            "The metric to display as a big number. "
+            "Must include an aggregate function (e.g., SUM, COUNT)."
+        ),
+    )
+    temporal_column: str | None = Field(
+        None,
+        description=(
+            "Temporal column for the trendline x-axis. "
+            "Required when show_trendline is True."
+        ),
+        min_length=1,
+        max_length=255,
+        pattern=r"^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$",
+    )
+    time_grain: TimeGrain | None = Field(
+        None,
+        description=(
+            "Time granularity for trendline data. "
+            "Common values: PT1H (hour), P1D (day), P1W (week), "
+            "P1M (month), P1Y (year)."
+        ),
+    )
+    show_trendline: bool = Field(
+        False,
+        description=(
+            "Show a trendline below the big number. "
+            "Requires 'temporal_column' to be set."
+        ),
+    )
+    subheader: str | None = Field(
+        None,
+        description="Subtitle text displayed below the big number",
+        max_length=500,
+    )

Review Comment:
   **Suggestion:** `subheader` is user-controlled display text but is not 
sanitized before being stored and returned in chart configuration. This 
introduces an XSS risk when rendered by clients; sanitize it using the existing 
`sanitize_user_input` path used elsewhere in this module. [security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ User subtitle text persists unsanitized in chart params.
   - ⚠️ External clients may render unsafe subtitle content.
   ```
   </details>
   
   ```suggestion
       subheader: str | None = Field(
           None,
           description="Subtitle text displayed below the big number",
           max_length=500,
       )
   
       @field_validator("subheader")
       @classmethod
       def sanitize_subheader(cls, v: str | None) -> str | None:
           """Sanitize subtitle text to prevent XSS attacks."""
           return sanitize_user_input(v, "Subheader", max_length=500, 
allow_empty=True)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call MCP `generate_chart` 
(`superset/mcp_service/chart/tool/generate_chart.py:124`)
   with Big Number config containing `subheader` HTML/script-like text.
   
   2. `BigNumberChartConfig` accepts it as-is because `subheader` in
   `superset/mcp_service/chart/schemas.py:638-642` has only max length, no 
sanitizing
   validator.
   
   3. Mapping layer copies raw text into chart form_data
   (`superset/mcp_service/chart/chart_utils.py:663-664`), and save mode 
persists it in chart
   params (`generate_chart.py:106-113`).
   
   4. Response also returns raw `form_data` to clients 
(`generate_chart.py:728-730`), so any
   downstream renderer not escaping this field can surface XSS-like content.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/schemas.py
   **Line:** 638:642
   **Comment:**
        *Security: `subheader` is user-controlled display text but is not 
sanitized before being stored and returned in chart configuration. This 
introduces an XSS risk when rendered by clients; sanitize it using the existing 
`sanitize_user_input` path used elsewhere in this module.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38403&comment_hash=502280911f19727ea12588671747b529f3cd9462880a8c4d32958bec46fa2ef7&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38403&comment_hash=502280911f19727ea12588671747b529f3cd9462880a8c4d32958bec46fa2ef7&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to