dosubot[bot] commented on issue #8842: URL: https://github.com/apache/incubator-devlake/issues/8842#issuecomment-4285448025
<!-- Answer --> Thanks for the thorough write-up — this is a real and currently unhandled limitation. I can confirm from the code that the `createdAfter` parameter in the `Query` hook is indeed accepted but **never forwarded** to the GitHub API [[1]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github/tasks/cicd_run_collector.go#L77-L85). The only query params sent are `page` and `per_page`, so all time filtering happens client-side after fetching. The code already documents a related workaround: using `status=completed` hits a separate 34-page cap, so that filter was deliberately removed in favor of client-side filtering [[2]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github/tasks/cicd_run_collector.go#L79-L81). But the broader 40k-item cap on unfiltered `/actions/runs` pagination is not handled anywhere — no existing PR or issue resolution addresses it. Your proposed adaptive bisection approach using the `created` query parameter looks like the right direction. A few notes that may help if you submit a PR: - The `createdAfter` parameter is already wired into the `Query` function signature via the `ApiCollectorStateManager` pattern, so the plumbing is in place — it just needs to be used [[1]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github/tasks/cicd_run_collector.go#L77-L85). - The `GetCreated` callback (lines 111-118) already extracts `created_at` from each run, so the framework can determine when to stop fetching within a window [[3]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github/tasks/cicd_run_collector.go#L111-L118). - `PAGE_SIZE` is intentionally set to 30 (not 100) to avoid response body size issues referenced in #3199 [[4]](https://github.com/apache/incubator-devlake/blob/2c15ad880dce830c6512fa58039527cbf4745b8b/backend/plugins/github/tasks/cicd_run_collector.go#L39-L42). With `per_page=30`, the 40k cap is hit around page 1,334 — switching to `per_page=100` only pushes it to page 400, so increasing page size alone doesn't solve it. - For reference, a similar pattern was implemented for GitLab in PR #8572, which introduced keyset pagination to work around GitLab's 50k offset pagination limit [[5]](https://github.com/apache/incubator-devlake/pull/8572). The adaptive windowing you describe would be the REST-specific equivalent for GitHub's `/actions/runs`. One thing to watch out for: the filtered mode (`created` param) has its own 1,000-result-per-search cap, so the bisection logic needs to detect when `total_count` hits 1,000 and subdivide further, as you noted in the pseudocode. <!-- Dosu Comment Footer --> *To reply, just mention [@dosu](https://go.dosu.dev/dosubot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=reply-with-mention&utm_term=apache%2Fincubator-devlake).* --- Docs are dead. Just use [Dosu](https://dosu.dev?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-tagline&utm_term=apache%2Fincubator-devlake). [](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-feedback&utm_term=apache%2Fincubator-devlake#message_id=37f55949-7456-492b-92d9-22b7023f00eb) [](https://github.dosu.com/apache/incubator-devlake?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-ask-repo&utm_term=apache%2Fincubator-devlake) [](https://app.dosu.dev/signup?referrer=openSource&source=github-footer&utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=static-docs-share-team&utm_term=apache%2Fincubator-devlake) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
