andreahlert opened a new pull request, #12: URL: https://github.com/apache/comdev/pull/12
## Problem `search_list` currently formats the first 30 emails of the backend response and prints `... and N more emails` for the rest. Because there is no `limit` or `offset` parameter, the remaining `N` entries are unreachable: re-querying the backend just returns the same first 30. For LLM clients doing list-wide analysis (e.g. enumerating every thread on a list, walking a month of archive), this hides most of the data behind a hard wall. Concrete example I hit while writing this PR: `[email protected]` has 93 emails / 46 threads. Calling `search_list` returns 30; the other 63 emails are invisible. Working around it requires chaining many narrow `from:` / `subject:` queries hoping to cover the full set, with no guarantee of completeness. ## Solution Two optional parameters on `search_list`: - `limit` (int, 1..200, default 30) — caps the rendered window - `offset` (int, >= 0, default 0) — skips the first N entries Defaults preserve current behaviour. The backend call is unchanged; both parameters only affect which slice of the already-fetched result set is formatted into the response. No new API hits, no new auth surface, no new env vars. The trailing summary is upgraded: - Before: `... and N more emails` (no way to reach them) - After: `Showing X-Y of Z.` plus `... K more emails. Re-query with offset=N to continue.` (deterministic pagination) ## Why now This is the first thing a long-running LLM session needs that the MCP doesn't have. The original 30-cap is the right default for browse-style use, but offers no escape hatch for analysis-style use. The 200 ceiling on `limit` keeps the worst-case response size sane (~40KB of summaries) while removing the "you can only ever see 30" wall. ## Backwards compatibility - No schema field renamed or removed. - Calling without `limit`/`offset` produces the same first 30 as before; only the trailing line text changes (`Showing 1-30 of 93.` vs `... and 63 more emails`). - No change to `get_email`, `get_thread`, `get_mbox`, or any auth/lifecycle tool. ## Test plan - [x] `npm test` (43 existing tests pass) - [x] `node --check index.js` (syntax clean) - [ ] Manual smoke test against `lists.apache.org`: - [ ] `search_list(list=dev-magpie, domain=airflow.apache.org)` — same first 30 as before, new footer line - [ ] `search_list(... limit=100)` — renders up to 93 (the full hit set) - [ ] `search_list(... limit=30, offset=30)` — renders emails 31-60, footer hints `offset=60` - [ ] `search_list(... offset=999)` — renders 0 emails, "offset is past the end" footer ## Notes for reviewers - I considered cursor-based pagination but the backend `/api/stats.lua` returns the full set in one response, so an opaque cursor would just be a re-encoded numeric offset. Plain `offset` is simpler and equivalent. - The same cap pattern exists on `parts.slice(0, 15)` (participants) and `truncate(text, 8000/4000/10000)` (bodies / mbox). Not touched here — scope of this PR is just `search_list`. Happy to follow up with a second PR if the design lands well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
