[PR] fix(mcp): use tiktoken for response-size-guard token estimation [superset]

via GitHub Wed, 06 May 2026 09:52:15 -0700


aminghadersohi opened a new pull request, #39912:
URL: https://github.com/apache/superset/pull/39912


   ### SUMMARY
   
   The MCP response-size-guard middleware (`ResponseSizeGuardMiddleware`) 
estimates token counts to decide when to truncate or block oversized tool 
responses. The existing estimator at 
`superset/mcp_service/utils/token_utils.py` used a simple char-to-token 
heuristic (`CHARS_PER_TOKEN = 3.5`) that miscounts JSON-heavy MCP responses 
relative to Claude's actual tokenizer. Specific responses could slip past the 
configured token limit while still being truncated by the Claude Agent SDK's 
own threshold — the SDK then saved them into a file the model could not read 
back, causing 120s timeouts in tool calls like `get_dataset_info` for wide 
datasets.
   
   This PR switches the estimator to **tiktoken's `cl100k_base` encoding** — a 
real BPE tokenizer with a vocabulary similar to Claude's. For English and 
JSON-heavy content it tracks Claude's counts within roughly ±10%, which is far 
closer than any character-ratio heuristic.
   
   The previous heuristic stays as a graceful **fallback** for environments 
where tiktoken is not installed; its ratio drops from 3.5 → 3.0 chars/token to 
be more conservative for JSON content (which under-counted before).
   
   ### BEFORE/AFTER
   
   ```
   estimate_token_count("a 80KB JSON dataset info response")
     before (3.5 chars/token):       ~22,800 tokens (slipped past 25k cap)
     after (tiktoken cl100k_base):   accurate Claude-aligned count
   ```
   
   ### TESTING INSTRUCTIONS
   
   ```bash
   pytest tests/unit_tests/mcp_service/utils/test_token_utils.py -v
   pytest tests/unit_tests/mcp_service/test_middleware.py -v
   ```
   
   New unit tests cover:
   - tiktoken-loaded path produces non-zero counts
   - bytes input matches string input
   - Length monotonicity (doubling input ≈ doubles count, ±10%)
   - Fallback path when `_ENCODING is None` (tiktoken not installed) uses 
`len/CHARS_PER_TOKEN`
   - Defensive fallback when tiktoken's `encode` raises — the size guard must 
never fail-open
   
   ### ADDITIONAL INFORMATION
   
   - **New dependency**: `tiktoken>=0.7.0,<1.0` added to the `fastmcp` extra in 
`pyproject.toml`. Anyone installing `apache-superset[fastmcp]` gets it 
automatically. `requirements/base.txt` and `requirements/development.txt` 
regenerated via `scripts/uv-pip-compile.sh`.
   - **No network calls**: tiktoken is pure offline tokenization. Anthropic's 
`count_tokens` API is more accurate but adds a network roundtrip per tool 
result, which is too expensive for synchronous middleware.
   - **Behavioral change**: previously-passing token estimates for the same 
content will now report different (more accurate) numbers. Sites relying on a 
specific cap will see different effective behavior — typically slightly more 
conservative truncation for English-text-heavy responses, slightly less for 
highly repetitive content (BPE compresses repetition).
   
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix(mcp): use tiktoken for response-size-guard token estimation [superset]

Reply via email to