danielemoraschi opened a new issue, #8614: URL: https://github.com/apache/incubator-devlake/issues/8614
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened This is a follow-up to the discussion in #8576 When running a pipeline on a project that includes a very large GitHub repository (5+ years of history, 100' of commits weekly), the `github_graphql` task for collecting job runs hangs for days. The logs show that the process eventually fails due _likely_ to a `stream error: CANCEL` received from the peer (GitHub's servers). As identified by @klesh in the original thread, this is likely caused by the API response body size exceeding a limit on GitHub's end, which leads to the server terminating the connection. Errors in the logs: ``` time="2025-10-15 13:01:52" level=warning msg=" [pipeline service] [pipeline #63] [task #16353] retry #1 graphql calling after 120s\n\tcaused by: stream error: stream ID 1; CANCEL; received from peer" time="2025-10-15 13:04:03" level=warning msg=" [pipeline service] [pipeline #63] [task #16353] retry #2 graphql calling after 120s\n\tcaused by: non-200 OK status code: 502 Bad Gateway body: \"<html>\\r\\n<head><title>502 Bad Gateway</title></head>\\r\\n<body>\\r\\n<center><h1>502 Bad Gateway</h1></center>\\r\\n<hr><center>nginx</center>\\r\\n</body>\\r\\n</html>\\r\\n\"" ``` ### What do you expect to happen The pipeline should handle large API responses either completing the data collection successfully or fail with a error message about exceeding API limits, rather than hanging indefinitely. ### How to reproduce - Configure a DevLake project with a connection to a very large GitHub repository. - Create and run a new pipeline that includes collecting GitHub Actions data. - Monitor the devlake container logs. - Observe that the pipeline hangs on the `github_graphql` task, specifically "Collect Job Runs". - After a long period, the following error appears in the logs ### Anything else Attaching here a new snapshot log file. [task-16353-2-1-github_graphql.log](https://github.com/user-attachments/files/22946344/task-16353-2-1-github_graphql.log) ### Version v1.0.3-beta6@44f2db2 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
