aminghadersohi opened a new pull request, #39912:
URL: https://github.com/apache/superset/pull/39912
### SUMMARY
The MCP response-size-guard middleware (`ResponseSizeGuardMiddleware`)
estimates token counts to decide when to truncate or block oversized tool
responses. The existing estimator at
`superset/mcp_service/utils/token_utils.py` used a simple char-to-token
heuristic (`CHARS_PER_TOKEN = 3.5`) that miscounts JSON-heavy MCP responses
relative to Claude's actual tokenizer. Specific responses could slip past the
configured token limit while still being truncated by the Claude Agent SDK's
own threshold — the SDK then saved them into a file the model could not read
back, causing 120s timeouts in tool calls like `get_dataset_info` for wide
datasets.
This PR switches the estimator to **tiktoken's `cl100k_base` encoding** — a
real BPE tokenizer with a vocabulary similar to Claude's. For English and
JSON-heavy content it tracks Claude's counts within roughly ±10%, which is far
closer than any character-ratio heuristic.
The previous heuristic stays as a graceful **fallback** for environments
where tiktoken is not installed; its ratio drops from 3.5 → 3.0 chars/token to
be more conservative for JSON content (which under-counted before).
### BEFORE/AFTER
```
estimate_token_count("a 80KB JSON dataset info response")
before (3.5 chars/token): ~22,800 tokens (slipped past 25k cap)
after (tiktoken cl100k_base): accurate Claude-aligned count
```
### TESTING INSTRUCTIONS
```bash
pytest tests/unit_tests/mcp_service/utils/test_token_utils.py -v
pytest tests/unit_tests/mcp_service/test_middleware.py -v
```
New unit tests cover:
- tiktoken-loaded path produces non-zero counts
- bytes input matches string input
- Length monotonicity (doubling input ≈ doubles count, ±10%)
- Fallback path when `_ENCODING is None` (tiktoken not installed) uses
`len/CHARS_PER_TOKEN`
- Defensive fallback when tiktoken's `encode` raises — the size guard must
never fail-open
### ADDITIONAL INFORMATION
- **New dependency**: `tiktoken>=0.7.0,<1.0` added to the `fastmcp` extra in
`pyproject.toml`. Anyone installing `apache-superset[fastmcp]` gets it
automatically. `requirements/base.txt` and `requirements/development.txt`
regenerated via `scripts/uv-pip-compile.sh`.
- **No network calls**: tiktoken is pure offline tokenization. Anthropic's
`count_tokens` API is more accurate but adds a network roundtrip per tool
result, which is too expensive for synchronous middleware.
- **Behavioral change**: previously-passing token estimates for the same
content will now report different (more accurate) numbers. Sites relying on a
specific cap will see different effective behavior — typically slightly more
conservative truncation for English-text-heavy responses, slightly less for
highly repetitive content (BPE compresses repetition).
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in
[SIP-59](https://github.com/apache/superset/issues/13351))
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]