aminghadersohi opened a new pull request, #39898: URL: https://github.com/apache/superset/pull/39898
### SUMMARY `get_dataset_info` could return ~80KB+ payloads for wide datasets, causing clients to truncate the response and LLM agents to time out trying to recover. Two issues: 1. The response always included verbose top-level fields (`params`, `template_params`, `extra`, `certified_by`, `certification_details`, `tags`, `schema_perm`) regardless of caller need. 2. Each `TableColumnInfo` serialized all 7 fields including long `description` text, so a 50-column dataset with verbose descriptions alone could exceed 30KB. This change adds two new request parameters to `GetDatasetInfoRequest`: - **`select_columns`** — top-level fields to include. Defaults to a lean set (`id`, `table_name`, `schema`, `database_name`, `database_id`, `uuid`, `is_virtual`, `description`, `main_dttm_col`, `sql`, `url`, `columns`, `metrics`). - **`column_fields`** — per-column fields to include in `columns` entries. Defaults to `["column_name", "type", "is_dttm"]`. Wider lists let callers opt in to `verbose_name`, `groupby`, `filterable`, `description`. `TableColumnInfo` and `DatasetInfo` already had a `model_serializer(mode="wrap")` that reads `select_columns` from the Pydantic serialization context. The tool now passes both `select_columns` and `column_fields` through `model_dump(context=...)` so filtering applies during serialization rather than after, mirroring the pattern already in `list_datasets` and `list_databases`. The default response shrinks from ~80KB to a few KB for typical wide datasets while existing callers that pass explicit `select_columns` continue to work unchanged. ### BEFORE/AFTER SCREENSHOTS N/A — backend change. Behavior change is observable via response payload size on `tools/call` for `get_dataset_info`. ### TESTING INSTRUCTIONS ```bash pytest tests/unit_tests/mcp_service/dataset/ -v ``` New tests verify: - Default response excludes verbose top-level and per-column fields. - Explicit `select_columns` trims the response to requested fields only. - Explicit `column_fields` opts in to verbose per-column fields. ### ADDITIONAL INFORMATION - [ ] Has associated issue: - [ ] Required feature flags: - [x] Changes UI - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351)) - [ ] Migration is atomic, supports rollback & is backwards-compatible - [ ] Confirm DB migration upgrade and downgrade tested - [ ] Runtime estimates and downtime expectations provided - [ ] Introduces new feature or API - [ ] Removes existing feature or API -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
