Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

via GitHub Mon, 08 Dec 2025 09:44:17 -0800


codeant-ai-for-open-source[bot] commented on PR #36458:
URL: https://github.com/apache/superset/pull/36458#issuecomment-3628244034


   ## Nitpicks 🔍
   
   <table>
   <tr><td>🔒&nbsp;<strong>No security issues identified</strong></td></tr>
   <tr><td>⚡&nbsp;<strong>Recommended areas for review</strong><br><br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-c895cf0d2e35cd84dd278b251c1e4ab8a0ecb9588e21b13e7c6ace5113f862b4R56-R103'><strong>Eager
 init</strong></a><br>`ModelGetSchemaCore` instances are created at import 
time. If construction becomes heavier (or DAOs have side effects), this can 
increase startup cost and complicate testing/mocking. Consider lazy 
initialization or a factory to create cores on first use.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-33f87750cdefd39870a50f191b5993739c162d0841accdb11e4646ea019da04aR115-R121'><strong>Parsing
 Issue</strong></a><br>`request.select_columns` is used directly when building 
the serialization context. If the request
   field can be provided as a JSON string, CSV, or other serialized form, using 
it directly may lead
   to incorrect behavior. Use the same parsing utility used by `ModelListCore` 
to normalize the value
   (e.g., parse JSON/CSV -> list) before applying it.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-e1a76d1f013e6c55119080fd7ec635310f15bc77f7c49e235244cb3121eb1aa3R120-R125'><strong>Columns
 vs. serialization mismatch</strong></a><br>The response returned by 
ModelListCore includes `columns_loaded` (the actual columns loaded). The code 
uses `request.select_columns` (or default) to filter fields for output; this 
can cause the reported `columns_loaded` to diverge from the fields actually 
serialized, producing inconsistent API responses.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-05a9898ba969a7986371667690eb49e5f685d450c31a1f642e020f6caaf5f28aR136-R145'><strong>Unparsed
 select_columns (Possible Bug)</strong></a><br>The code now uses 
`request.select_columns` directly for the serialization context when calling 
`result.model_dump(...)`. However, `request.select_columns` may be a JSON 
string or CSV (the same inputs that `ModelListCore.run_tool` parses 
internally). Passing an unparsed string into the context can break field 
filtering logic which expects a list of column names. Use the parsed column 
list that the DAO used (or coerce/parse the request value) to ensure consistent 
behavior.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-e1a76d1f013e6c55119080fd7ec635310f15bc77f7c49e235244cb3121eb1aa3R147-R156'><strong>Request
 parsing mismatch</strong></a><br>`request.select_columns` is used directly to 
build the serialization context. However, ModelListCore internally parses 
`select_columns` (accepting CSV / JSON / list). If `request.select_columns` is 
a CSV or raw string, the serialization context may receive a string instead of 
a list which can break filtering logic in DatasetInfo serialization.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-c53108521a676f135ad7846c809dddc3eb33240abb7d123d0948cd5f7b2e9765R356-R396'><strong>Identifier
 mutation</strong></a><br>The `ColumnRef.sanitize_name` validator returns 
`html.escape(...)` (escaped value) for `name`, which can change column 
identifiers used for queries/joins (e.g., converting spaces/quotes to HTML 
entities). That may make the `name` no longer match database/metric identifiers 
and break query generation or equality checks. Consider preserving a canonical 
identifier for internals and only escaping for display.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-7399392e20028281da2c39f4cccfcffeb9ea5d5a0cbaa739d8d072fd1f35c6ebR506-R540'><strong>Serialization
 mismatch</strong></a><br>The simple serializer `serialize_dashboard_object` 
populates keys like `created_by_name` and `changed_by_name` when constructing 
`DashboardInfo`, but `DashboardInfo` defines `created_by` and `changed_by` (no 
`_name` fields). This can lead to ignored fields or validation errors / 
unexpected missing data in responses.<br>
   
   - [ ] <a 
href='https://github.com/apache/superset/pull/36458/files#diff-367d5eb5f000133464ca2fde918009da6b5467e821cb3a94fabd369771201f50R553-R561'><strong>Schema
 field name mismatch</strong></a><br>The run_tool constructs the output schema 
using the keyword `default_select`. Ensure the target Pydantic output_schema 
actually expects `default_select` (not `default_columns` or similar). A 
mismatch will raise validation errors at runtime.<br>
   
   </td></tr>
   </table>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

Reply via email to