renflo opened a new issue, #5849: URL: https://github.com/apache/incubator-devlake/issues/5849
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened # Some context: 1. I am using devlake at work primarily collect DORA metrics and have therefore configured tens of devlake projects. 2. Some of the repositories scanned by the devlake projects have thousands of jobs. 3. To keep data fresh each devlake project is scheduled to run daily. 1. I've struggled a bit with parallelism my deployment is on k8s and I have scaled down to 1 lake pod to minimize them. But they are not entirely gone. # The issue The biggest issue I have right now is that some projects fail to gather data from gitlab, which means that even if the following steps in the devlake pipeline do succeed the resulting data is erroneous/mangled. The screen under documents how such a failed pipeline looks like.  That's very problematic because it happens relatively often and greatly diminishes the value of our DORA statistics. # Available information The "Task failed" popups (which I hope you can[ improve](https://github.com/apache/incubator-devlake/issues/5847)) show that some kind of error is occurring. ## Slice index out of range  The first one a bout the slice index out of range does not make any sense to me - but if sounds like a plain bug. I can find it in the logs, formatted in a more friendly way:  Hope you can look into it but I don't think it's the root cause of my issue - the next one is. ## Too Many Requests  The second error message is much more interesting. I find it in the logs as well and it seems pretty obvious that gitlab is not accepting the number of requests devlake sends to it. Do note that the following mentions a specific job: this job does exist in gitlab, so it has nothing to do with its existence. ``` stack trace -- stack trace: | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks | /app/core/runner/run_task.go:274 | [...repeated from below...] Wraps: (2) subtask collectApiJobs ended unexpectedly Wraps: (3) attached stack trace -- stack trace: | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*WorkerScheduler).WaitAsync | /app/helpers/pluginhelper/api/worker_scheduler.go:173 | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiCollector).Execute | /app/helpers/pluginhelper/api/api_collector.go:197 | github.com/apache/incubator-devlake/helpers/pluginhelper/api.(*ApiCollectorStateManager).Execute | /app/helpers/pluginhelper/api/api_collector_with_state.go:112 | github.com/apache/incubator-devlake/plugins/gitlab/tasks.CollectApiJobs | /app/plugins/gitlab/tasks/job_collector.go:128 | github.com/apache/incubator-devlake/core/runner.runSubtask | /app/core/runner/run_task.go:331 | github.com/apache/incubator-devlake/core/runner.RunPluginSubTasks | /app/core/runner/run_task.go:272 | github.com/apache/incubator-devlake/core/runner.RunPluginTask | /app/core/runner/run_task.go:158 | github.com/apache/incubator-devlake/core/runner.RunTask | /app/core/runner/run_task.go:134 | github.com/apache/incubator-devlake/server/services.runTaskStandalone | /app/server/services/task_runner.go:131 | github.com/apache/incubator-devlake/server/services.RunTasksStandalone.func1 | /app/server/services/task.go:186 | runtime.goexit | /usr/local/go/src/runtime/asm_amd64.s:1598 Wraps: (4) | combined messages: | { | Retry exceeded 9 times calling projects/26623118/jobs/3134745242. The last error was: Http DoAsync error calling [GET projects/26623118/jobs/3134745242]. Response: <!DOCTYPE html> | <html> | <head> | <meta content="width=device-width, initial-scale=1, maximum-scale=1" name="viewport"> | <title>429 Too Many Requests</title> | <style> | body { | color: #666; | text-align: center; | font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; | margin: auto; | font-size: 14px; | } | | h1 { | font-size: 56px; | line-height: 100px; | font-weight: normal; | color: #456; | } | | h2 { | font-size: 24px; | color: #666; | line-height: 1.5em; | } | | h3 { | color: #456; | font-size: 20px; | font-weight: normal; | line-height: 28px; | } | | hr { | max-width: 800px; | margin: 18px auto; | border: 0; | border-top: 1px solid #EEE; | border-bottom: 1px solid white; | } | | img { | max-width: 40vw; | } | | .container { | margin: auto 20px; | } | </style> | </head> | | <body> | <h1> | <img src=" IGQ9Ik0xOTcuODU0NCA4NC43MzQyaC01NC4xNTNsMjMuMjczLTcxLjYyNWMxLjE5Ny0zLjY4NiA2LjQxMS0zLjY4NSA3LjYwOCAwbDIzLjI3MiA3MS42MjV6IiBmaWxsPSIjZTI0MzI5Ii8+Cjwvc3ZnPgo=" alt="GitLab Logo" /><br /> | 429 | </h1> | <div class="container"> | <h3>Too many requests received.</h3> | <hr /> | <p>Please try again later.</p> | </div> | </body> | </html> | (429) | ===================== | Retry exceeded 9 times calling projects/26623118/jobs/3134745244. The last error was: Http DoAsync error calling [GET [...] ``` ### What do you expect to happen I expect the pipeline to successfully fetch data from gitlab - regardless of the size of the repository, number of other concurrent pipelines or throttling mecanisms in place at gitlab. ### How to reproduce I cannot share the code I'm scanning. But for you to reproduce I would imagine that scanning a repository with a large enough (thousands, hundreds? I must say that I don't know) number of already run jobs should do it. ### Anything else Throttling and limiting the rate of API request to SaaS systems is common - devlake will likely offer many integrations that have throttling / rate limiting mechanism in place. In my case it's gitlab but it could be anything. Devlake needs to offer a solution to this. Observations that could perhaps lead to a solution: 1. I suspect that devlake fetches ALL data from gitlab each time the pipeline runs (based on the fact that they seem to always take very long time, even from day to day). If it's the case an obvious optimization would be to only fetch data which has happened AFTER the latest collected data found in devlake db. 2. I am not sure what happens when I receive 429 Too Many Errors: some data has always been gathered before the error occurs. Has it been stored? I suspect that it isn't, and if it's discarded it really is a shame. Some data would be better than no data. In addition combined with 1 that would guarantee that a pipeline failing because of 429 would anyways gradually collect more and more data, and would at some point "catch up" and be able to run without failing. ### Version 0.18.0-beat4 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
