codeant-ai-for-open-source[bot] commented on code in PR #37183:
URL: https://github.com/apache/superset/pull/37183#discussion_r2770535514


##########
superset/mcp_service/chart/tool/get_chart_data.py:
##########
@@ -42,6 +45,21 @@
 logger = logging.getLogger(__name__)
 
 
+def _get_cached_form_data(form_data_key: str) -> str | None:
+    """Retrieve form_data from cache using form_data_key.
+
+    Returns the JSON string of form_data if found, None otherwise.
+    """
+    from superset.commands.explore.form_data.get import GetFormDataCommand
+
+    try:
+        cmd_params = CommandParameters(key=form_data_key)
+        return GetFormDataCommand(cmd_params).run()
+    except (KeyError, ValueError, CommandException) as e:
+        logger.warning("Failed to retrieve form_data from cache: %s", e)

Review Comment:
   **Suggestion:** `_get_cached_form_data` claims to return a string or None 
but doesn't catch `TemporaryCacheGetFailedError` raised by 
`GetFormDataCommand.run`, so a transient cache/DB error will bubble up and fail 
the entire chart data request instead of gracefully falling back to the saved 
configuration. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ get_chart_data requests fail on transient cache DB errors.
   - ⚠️ Unsaved-state retrieval won't gracefully fallback.
   - ⚠️ Degrades MCP reliability under cache failures.
   ```
   </details>
   
   ```suggestion
       from superset.commands.explore.form_data.exceptions import (
           TemporaryCacheGetFailedError,
       )
   
       try:
           cmd_params = CommandParameters(key=form_data_key)
           return GetFormDataCommand(cmd_params).run()
       except (KeyError, ValueError, CommandException, 
TemporaryCacheGetFailedError) as e:
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Store a form_data key in explore_form_data_cache (normal usage) or 
simulate cache
   failure. The command used is GetFormDataCommand defined in
   superset/commands/explore/form_data/get.py (see file lines 35-63 in 
Additional Downstream
   Code Context).
   
   2. Trigger the MCP get_chart_data tool with a request that includes 
form_data_key
   (get_chart_data entry at superset/mcp_service/chart/tool/get_chart_data.py; 
function
   begins at line ~65 in the PR diff).
   
   3. get_chart_data calls _get_cached_form_data(form_data_key) (helper defined 
at lines
   ~48-61). GetFormDataCommand.run() may raise TemporaryCacheGetFailedError 
when cache/DB has
   a transient error (see get.py: run() raising TemporaryCacheGetFailedError on
   SQLAlchemyError at lines 35-63).
   
   4. Because _get_cached_form_data currently does not catch 
TemporaryCacheGetFailedError,
   that exception will propagate, causing the chart data request to fail 
instead of logging a
   warning and falling back to saved chart configuration.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/get_chart_data.py
   **Line:** 54:59
   **Comment:**
        *Logic Error: `_get_cached_form_data` claims to return a string or None 
but doesn't catch `TemporaryCacheGetFailedError` raised by 
`GetFormDataCommand.run`, so a transient cache/DB error will bubble up and fail 
the entire chart data request instead of gracefully falling back to the saved 
configuration.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/dashboard/tool/get_dashboard_info.py:
##########
@@ -71,14 +110,45 @@ async def get_dashboard_info(
         result = tool.run_tool(request.identifier)
 
         if isinstance(result, DashboardInfo):
+            # If permalink_key is provided, retrieve filter state
+            if request.permalink_key:
+                await ctx.info(
+                    "Retrieving filter state from permalink: permalink_key=%s"
+                    % (request.permalink_key,)
+                )
+                permalink_value = _get_permalink_state(request.permalink_key)

Review Comment:
   **Suggestion:** The synchronous permalink lookup `_get_permalink_state` is 
called directly inside an async tool, which can block the event loop during 
database and key-value store access; wrapping it in `asyncio.to_thread` avoids 
blocking other async tasks. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP tool latency increases under concurrent requests.
   - ⚠️ Event loop stalls during DB/key-value access.
   - ⚠️ Affects all callers of get_dashboard_info with permalink_key.
   ```
   </details>
   
   ```suggestion
                   import asyncio
   
                   permalink_value = await asyncio.to_thread(
                       _get_permalink_state, request.permalink_key
                   )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start the MCP service (superset/cli/mcp.py:32-44) so FastMCP async 
handlers run.
   
   2. Send concurrent FastMCP requests to get_dashboard_info
   (superset/mcp_service/dashboard/tool/get_dashboard_info.py:62), each with a 
permalink_key
   so the code path at lines 114-119 is executed.
   
   3. Execution hits the synchronous helper _get_permalink_state defined at
   superset/mcp_service/dashboard/tool/get_dashboard_info.py:45-57. That 
function calls
   GetDashboardPermalinkCommand.run 
(superset/commands/dashboard/permalink/get.py:38-63),
   which performs synchronous DB and key-value DAO calls (KeyValueDAO.get_value 
and
   DashboardDAO.get_by_id_or_slug).
   
   4. Because get_dashboard_info is async but calls the blocking 
_get_permalink_state
   directly at line 119, the event loop running FastMCP's async handlers can be 
blocked for
   the duration of DB / key-value I/O. Under concurrent requests, this causes 
increased
   latency and stalls other async tasks.
   
   5. Reproduce deterministically by issuing multiple concurrent 
get_dashboard_info requests
   (each with permalink_key) and observing increased response latency or 
serialized handling
   compared to wrapping _get_permalink_state in asyncio.to_thread.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/get_dashboard_info.py
   **Line:** 119:119
   **Comment:**
        *Possible Bug: The synchronous permalink lookup `_get_permalink_state` 
is called directly inside an async tool, which can block the event loop during 
database and key-value store access; wrapping it in `asyncio.to_thread` avoids 
blocking other async tasks.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/chart/tool/get_chart_data.py:
##########
@@ -29,6 +30,8 @@
 if TYPE_CHECKING:
     from superset.models.slice import Slice
 
+from superset.commands.exceptions import CommandException
+from superset.commands.explore.form_data.parameters import CommandParameters

Review Comment:
   **Suggestion:** The import for `CommandParameters` points to 
`superset.commands.explore.form_data.parameters`, but the class is defined in 
`superset.commands.explore.parameters`, so trying to use 
`_get_cached_form_data` will raise an ImportError at runtime when that line is 
executed. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ MCP service may fail to start due to ImportError.
   - ❌ get_chart_data tool unavailable to callers.
   - ⚠️ Any dashboards relying on MCP tools affected.
   ```
   </details>
   
   ```suggestion
   from superset.commands.explore.parameters import CommandParameters
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start the MCP service which imports tools (module import occurs). See 
file:
   superset/mcp_service/chart/tool/get_chart_data.py (module top-level imports 
around lines
   33-38). Python imports the module and executes top-level imports.
   
   2. During import, Python evaluates "from 
superset.commands.explore.form_data.parameters
   import CommandParameters" (line shown above). That module path does not 
exist.
   
   3. The real class is defined in superset/commands/explore/parameters.py (see 
Additional
   Downstream Code Context: file superset/commands/explore/parameters.py lines 
22-27 where
   CommandParameters is declared).
   
   4. Result: ImportError is raised during module import, preventing the MCP 
tool module from
   loading and making get_chart_data unusable. This is deterministic and 
reproducible on
   service start.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/get_chart_data.py
   **Line:** 34:34
   **Comment:**
        *Possible Bug: The import for `CommandParameters` points to 
`superset.commands.explore.form_data.parameters`, but the class is defined in 
`superset.commands.explore.parameters`, so trying to use 
`_get_cached_form_data` will raise an ImportError at runtime when that line is 
executed.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/chart/tool/get_chart_data.py:
##########
@@ -119,20 +144,102 @@ async def get_chart_data(  # noqa: C901
         )
         logger.info("Getting data for chart %s: %s", chart.id, 
chart.slice_name)
 
-        import time
-
         start_time = time.time()
 
+        # Track whether we're using unsaved state
+        using_unsaved_state = False
+        cached_form_data_dict = None
+
         try:
             await ctx.report_progress(2, 4, "Preparing data query")
             from superset.charts.schemas import ChartDataQueryContextSchema
             from superset.commands.chart.data.get_data_command import 
ChartDataCommand
 
+            # Check if form_data_key is provided - use cached form_data instead
+            if request.form_data_key:
+                await ctx.info(
+                    "Retrieving unsaved chart state from cache: 
form_data_key=%s"
+                    % (request.form_data_key,)
+                )
+                if cached_form_data := 
_get_cached_form_data(request.form_data_key):
+                    try:
+                        cached_form_data_dict = 
utils_json.loads(cached_form_data)
+                        using_unsaved_state = True
+                        await ctx.info(
+                            "Using cached form_data from form_data_key for 
data query"
+                        )
+                    except (TypeError, ValueError) as e:
+                        await ctx.warning(
+                            "Failed to parse cached form_data: %s. "
+                            "Falling back to saved chart configuration." % 
str(e)
+                        )
+                else:
+                    await ctx.warning(
+                        "form_data_key provided but no cached data found. "
+                        "The cache may have expired. Using saved chart 
configuration."
+                    )
+
             # Use the chart's saved query_context - this is the key!
             # The query_context contains all the information needed to 
reproduce
             # the chart's data exactly as shown in the visualization
             query_context_json = None
-            if chart.query_context:
+
+            # If using cached form_data, we need to build query_context from it
+            if using_unsaved_state and cached_form_data_dict:
+                # Build query context from cached form_data (unsaved state)
+                from superset.common.query_context_factory import 
QueryContextFactory
+

Review Comment:
   **Suggestion:** The condition `if using_unsaved_state and 
cached_form_data_dict:` relies on the truthiness of the decoded form-data dict, 
so an empty but valid JSON object (`{}`) sets `using_unsaved_state` to True but 
skips building a query context and also disables the fallback path, leading to 
`query_context` never being assigned and causing an UnboundLocalError when 
executing the query. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ get_chart_data crashes on empty cached form_data.
   - ⚠️ Explaining edited-but-unsaved charts fails.
   - ⚠️ Breaks MCP tool reliability for unsaved states.
   ```
   </details>
   
   ```suggestion
               if using_unsaved_state and cached_form_data_dict is not None:
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Save an "unsaved" Explore state into the cache where the stored form_data 
JSON decodes
   to an empty object: '{}' (this is realistic when users open Explore but 
don't populate
   fields). The cache get path is handled by GetFormDataCommand.run (see
   superset/commands/explore/form_data/get.py lines 35-63).
   
   2. Call the MCP get_chart_data tool with that form_data_key (tool entry at
   superset/mcp_service/chart/tool/get_chart_data.py, function defined at line 
~65).
   
   3. _get_cached_form_data returns the JSON string '{}' and
   utils_json.loads(cached_form_data) yields {}. The code sets 
using_unsaved_state = True
   (when parsing succeeded) but cached_form_data_dict is {} (an empty dict).
   
   4. Later the code checks "if using_unsaved_state and cached_form_data_dict:" 
(lines
   ~185-191). Because {} is falsy, the branch that builds query_context is 
skipped; because
   using_unsaved_state is True, the fallback that constructs a query_context 
from saved
   chart.params (the "if query_context_json is None and not 
using_unsaved_state" block) is
   also skipped. As a result query_context is never assigned and later usage
   (ChartDataCommand(query_context) execution) raises an UnboundLocalError / 
NameError,
   causing the request to fail.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/get_chart_data.py
   **Line:** 191:191
   **Comment:**
        *Logic Error: The condition `if using_unsaved_state and 
cached_form_data_dict:` relies on the truthiness of the decoded form-data dict, 
so an empty but valid JSON object (`{}`) sets `using_unsaved_state` to True but 
skips building a query context and also disables the fallback path, leading to 
`query_context` never being assigned and causing an UnboundLocalError when 
executing the query.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/dashboard/tool/get_dashboard_info.py:
##########
@@ -71,14 +110,45 @@ async def get_dashboard_info(
         result = tool.run_tool(request.identifier)
 
         if isinstance(result, DashboardInfo):
+            # If permalink_key is provided, retrieve filter state
+            if request.permalink_key:
+                await ctx.info(
+                    "Retrieving filter state from permalink: permalink_key=%s"
+                    % (request.permalink_key,)
+                )
+                permalink_value = _get_permalink_state(request.permalink_key)
+
+                if permalink_value:
+                    # Extract the state from permalink value

Review Comment:
   **Suggestion:** The filter state retrieved from a permalink is applied to 
whatever dashboard was requested, without verifying that the permalink actually 
belongs to that dashboard, which can result in returning a dashboard's metadata 
combined with another dashboard's filter state. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP get_dashboard_info returns mismatched filter_state.
   - ⚠️ Chatbot explains wrong dashboard state to users.
   - ⚠️ Downstream tools may apply incorrect filters.
   ```
   </details>
   
   ```suggestion
                       # Ensure the permalink belongs to the same dashboard 
before applying state
                       permalink_dashboard_id = 
permalink_value.get("dashboardId")
                       if isinstance(permalink_dashboard_id, int) and 
permalink_dashboard_id != result.id:
                           await ctx.warning(
                               "permalink_key dashboardId (%s) does not match 
requested dashboard id (%s); "
                               "ignoring permalink filter state."
                               % (permalink_dashboard_id, result.id)
                           )
                       else:
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start the MCP service (see superset/cli/mcp.py:32-44) so FastMCP tools 
are available.
   
   2. Call the FastMCP tool get_dashboard_info at
   superset/mcp_service/dashboard/tool/get_dashboard_info.py:62 (function
   get_dashboard_info), passing:
   
      - identifier = <dashboard A id>
   
      - permalink_key = <permalink created for dashboard B>
   
      (FastMCP will route to get_dashboard_info because of the @tool decorator 
and
      parse_request.)
   
   3. Inside get_dashboard_info 
(superset/mcp_service/dashboard/tool/get_dashboard_info.py),
   ModelGetInfoCore.run_tool is executed to load dashboard A (see run_tool call 
at line
   ~110). The code then reaches the block at lines 114-121 where 
request.permalink_key is
   present and _get_permalink_state is called.
   
   4. _get_permalink_state (defined at
   superset/mcp_service/dashboard/tool/get_dashboard_info.py:45-57) calls
   GetDashboardPermalinkCommand.run 
(superset/commands/dashboard/permalink/get.py:38-63)
   which returns a value containing "dashboardId" and "state" for dashboard B.
   
   5. Because the existing code applies permalink_value.state to the result 
without checking
   permalink_value["dashboardId"], the returned DashboardInfo (result) for 
dashboard A will
   have filter_state populated with dashboard B's state (see lines 121-127), 
producing a
   mismatched dashboard metadata + filter_state combination.
   
   6. Observe: the MCP response contains result.filter_state from the unrelated 
permalink.
   This reproduces the problem deterministically when a permalink_key 
referencing a different
   dashboard is used. The codebase does not currently validate dashboardId vs 
result.id, so
   the mismatch will occur whenever a caller supplies a permalink_key for a 
different
   dashboard.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/get_dashboard_info.py
   **Line:** 122:122
   **Comment:**
        *Logic Error: The filter state retrieved from a permalink is applied to 
whatever dashboard was requested, without verifying that the permalink actually 
belongs to that dashboard, which can result in returning a dashboard's metadata 
combined with another dashboard's filter state.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to