benjessop12 opened a new issue, #931: URL: https://github.com/apache/incubator-devlake/issues/931
**Description** 👋 Whilst attempting to scrape metrics from a large Gitlab project (hundreds of thousands of pipelines, tens of thousands of merge requests) the triggered pipeline that scrapes the metrics via the gitlab api was causing increased load to the point users were noticing decreased performance via the UI. The logs were showing ~19-20 of these calls per second: `GET https://<gitlab_endpoint>/api/v4/projects/<project_id>/merge_requests/<merged_request_id>/notes?system=false&per_page=1&page=0` **Proposed solution** _note: I did check for anything existing that related to throttling but couldn't find it, even in the [Gitlab](https://github.com/merico-dev/lake/blob/main/plugins/gitlab/README.md) documentation_ It would be nice to be able to rate limit the number of api calls made per xx time, or to set a float value to "wait" between api calls in effort to reduce the load. If this could be configured via the config-ui that would be fantastic. **Has the Feature been Requested Before?** I couldn't see any via searching similar keywords. Feel free too close if this request is a duplicate. **Describe alternatives you've considered** An alternative feature would be to configure the amount of pipelines, merge requests, etc (writing this is scoped to Gitlab at the moment, but can be applied to alternative integrated services) that are queried. For example, if I could configure the last 10_000 merge requests and 25_000 pipelines to be scraped for querying, that would be beneficial in the sense it would reduce the amount of time the scraping would run for as well as provide more recent data for querying. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
