danielemoraschi opened a new issue, #8614:
URL: https://github.com/apache/incubator-devlake/issues/8614

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   This is a follow-up to the discussion in #8576 
   
   When running a pipeline on a project that includes a very large GitHub 
repository (5+ years of history, 100' of commits weekly), the `github_graphql` 
task for collecting job runs hangs for days. The logs show that the process 
eventually fails due _likely_ to a `stream error: CANCEL` received from the 
peer (GitHub's servers).
   
   As identified by @klesh in the original thread, this is likely caused by the 
API response body size exceeding a limit on GitHub's end, which leads to the 
server terminating the connection.
   
   Errors in the logs:
   ```
   time="2025-10-15 13:01:52" level=warning msg=" [pipeline service] [pipeline 
#63] [task #16353] retry #1 graphql calling after 120s\n\tcaused by: stream 
error: stream ID 1; CANCEL; received from peer"
   time="2025-10-15 13:04:03" level=warning msg=" [pipeline service] [pipeline 
#63] [task #16353] retry #2 graphql calling after 120s\n\tcaused by: non-200 OK 
status code: 502 Bad Gateway body: \"<html>\\r\\n<head><title>502 Bad 
Gateway</title></head>\\r\\n<body>\\r\\n<center><h1>502 Bad 
Gateway</h1></center>\\r\\n<hr><center>nginx</center>\\r\\n</body>\\r\\n</html>\\r\\n\""
   ```
   
   ### What do you expect to happen
   
   The pipeline should handle large API responses either completing the data 
collection successfully or fail with a error message about exceeding API 
limits, rather than hanging indefinitely.
   
   ### How to reproduce
   
   - Configure a DevLake project with a connection to a very large GitHub 
repository.
   - Create and run a new pipeline that includes collecting GitHub Actions data.
   - Monitor the devlake container logs.
   - Observe that the pipeline hangs on the `github_graphql` task, specifically 
"Collect Job Runs".
   - After a long period, the following error appears in the logs
   
   ### Anything else
   
   Attaching here a new snapshot log file.
   
[task-16353-2-1-github_graphql.log](https://github.com/user-attachments/files/22946344/task-16353-2-1-github_graphql.log)
   
   ### Version
   
   v1.0.3-beta6@44f2db2
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to