benjessop12 opened a new issue, #931:
URL: https://github.com/apache/incubator-devlake/issues/931

   **Description**
   
   👋   Whilst attempting to scrape metrics from a large Gitlab project 
(hundreds of thousands of pipelines, tens of thousands of merge requests) the 
triggered pipeline that scrapes the metrics via the gitlab api was causing 
increased load to the point users were noticing decreased performance via the 
UI.
   
   The logs were showing ~19-20 of these calls per second:
   `GET 
https://<gitlab_endpoint>/api/v4/projects/<project_id>/merge_requests/<merged_request_id>/notes?system=false&per_page=1&page=0`
   
   **Proposed solution**
   _note: I did check for anything existing that related to throttling but 
couldn't find it, even in the 
[Gitlab](https://github.com/merico-dev/lake/blob/main/plugins/gitlab/README.md) 
documentation_
   
   It would be nice to be able to rate limit the number of api calls made per 
xx time, or to set a float value to "wait" between api calls in effort to 
reduce the load. If this could be configured via the config-ui that would be 
fantastic.
   
   **Has the Feature been Requested Before?**
   I couldn't see any via searching similar keywords. Feel free too close if 
this request is a duplicate.
   
   **Describe alternatives you've considered**
   An alternative feature would be to configure the amount of pipelines, merge 
requests, etc (writing this is scoped to Gitlab at the moment, but can be 
applied to alternative integrated services) that are queried. For example, if I 
could configure the last 10_000 merge requests and 25_000 pipelines to be 
scraped for querying, that would be beneficial in the sense it would reduce the 
amount of time the scraping would run for as well as provide more recent data 
for querying.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to