elijah-roberts opened a new issue, #6612: URL: https://github.com/apache/incubator-devlake/issues/6612
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened We have a large github repo that has a lot of activity (commits/branches/pull requests)and Pull request metrics were not generated for all known pull requests. This was approximately 21K pull requests. The repo in question has a lot of pull requests both historically and on an ongoing basis.  We initially started with 18 months of data (starting the middle of 2022) and would only get metrics though July of 2023. By removing pull requests prior to 2023 (8K) we were able to parse through October 2023. Here is an example of the distribution: ``` select count(id), MONTH(created_date) as month from pull_requests group by month; ``` <img width="462" alt="12 rows" src="https://github.com/apache/incubator-devlake/assets/19742213/c66353b0-8b07-4bc1-83bf-7f534dad9f80"> ``` select count(number), MONTH(github_created_at) month from _tool_github_pull_requests Where connection_id = 3 GROUP BY month; ``` <img width="492" alt="12 rows v" src="https://github.com/apache/incubator-devlake/assets/19742213/559533ef-631c-494d-9005-c2e550144d30"> We were able to further mitigate our issue be reducing the the amount of pull requests down to six months of data:  Doing this and re-running the dora task for lead time for changes correctly populated the pull request metrics for all know pull request. But this is a short term fix because as we continue to load data we will eventually hit the same limit. When reviewing logs for the dora metric task there are no errors reported, it just appears that at a certain point the task stops processing data. The only thing I can figure is that there is some type of limit on the cursor object preventing it from loading all of the rows: https://github.com/apache/incubator-devlake/blob/863863aa5618705f40957c21be099b6c84de62cb/backend/plugins/dora/tasks/change_lead_time_calculator.go#L51 Example logs: ``` time="2023-12-02 17:29:49" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13394" time="2023-12-02 17:29:50" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13395" time="2023-12-02 17:29:50" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31885 is nil\n" time="2023-12-02 17:29:50" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13396" time="2023-12-02 17:29:51" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13397" time="2023-12-02 17:29:51" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13398" time="2023-12-02 17:29:52" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13399" time="2023-12-02 17:29:52" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] [*crossdomain.ProjectPrMetric] batch save flush total 100 records to database" time="2023-12-02 17:29:52" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13400" time="2023-12-02 17:29:53" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13401" time="2023-12-02 17:29:53" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13402" time="2023-12-02 17:29:54" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13403" time="2023-12-02 17:29:54" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13404" time="2023-12-02 17:29:55" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13405" time="2023-12-02 17:29:55" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13406" time="2023-12-02 17:29:56" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13407" time="2023-12-02 17:29:56" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31898 is nil\n" time="2023-12-02 17:29:56" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13408" time="2023-12-02 17:29:57" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13409" time="2023-12-02 17:29:57" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13410" time="2023-12-02 17:29:58" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13411" time="2023-12-02 17:29:58" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13412" time="2023-12-02 17:29:59" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13413" time="2023-12-02 17:29:59" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13414" time="2023-12-02 17:30:00" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13415" time="2023-12-02 17:30:00" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13416" time="2023-12-02 17:30:01" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13417" time="2023-12-02 17:30:01" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13418" time="2023-12-02 17:30:02" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31911 is nil\n" time="2023-12-02 17:30:02" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13419" time="2023-12-02 17:30:02" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13420" time="2023-12-02 17:30:03" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13421" time="2023-12-02 17:30:03" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31914 is nil\n" time="2023-12-02 17:30:03" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13422" time="2023-12-02 17:30:04" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13423" time="2023-12-02 17:30:04" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13424" time="2023-12-02 17:30:05" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13425" time="2023-12-02 17:30:05" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31918 is nil\n" time="2023-12-02 17:30:05" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13426" time="2023-12-02 17:30:06" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31920 is nil\n" time="2023-12-02 17:30:06" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13427" time="2023-12-02 17:30:07" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13428" time="2023-12-02 17:30:07" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13429" time="2023-12-02 17:30:08" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13430" time="2023-12-02 17:30:08" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31924 is nil\n" time="2023-12-02 17:30:08" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13431" time="2023-12-02 17:30:09" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13432" time="2023-12-02 17:30:09" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13433" time="2023-12-02 17:30:10" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13434" time="2023-12-02 17:30:10" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13435" time="2023-12-02 17:30:11" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31929 is nil\n" time="2023-12-02 17:30:11" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13436" time="2023-12-02 17:30:11" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13437" time="2023-12-02 17:30:12" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13438" time="2023-12-02 17:30:12" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13439" time="2023-12-02 17:30:13" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31934 is nil\n" time="2023-12-02 17:30:13" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13440" time="2023-12-02 17:30:13" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13441" time="2023-12-02 17:30:14" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31936 is nil\n" time="2023-12-02 17:30:14" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13442" time="2023-12-02 17:30:14" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13443" time="2023-12-02 17:30:15" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13444" time="2023-12-02 17:30:15" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13445" time="2023-12-02 17:30:16" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] finished records: 13446" time="2023-12-02 17:30:16" level=debug msg=" [pipeline service] [pipeline #94] [task #1317] [calculateChangeLeadTime] [*crossdomain.ProjectPrMetric] batch save flush total 46 records to database" time="2023-12-02 17:30:16" level=info msg=" [pipeline service] [pipeline #94] [task #1317] finished step: 1 / 2" time="2023-12-02 17:30:16" level=info msg=" [pipeline service] [pipeline #94] [task #1317] executing subtask ConnectIncidentToDeployment" time="2023-12-02 17:30:16" level=info msg=" [pipeline service] [pipeline #94] [task #1317] [ConnectIncidentToDeployment] finished records: 1" ``` Note there is a custom debug log that we added so we could keep track of which specific PRs do not have an associated deployment. This also helped us understand what was one of the last PR's processed for metrics. ### What do you expect to happen I expected pull request metrics to be populated for all pull requests that match this query: https://github.com/apache/incubator-devlake/blob/main/backend/plugins/dora/tasks/change_lead_time_calculator.go#L52-L55 When run manually against the database 19K rows come back, but only 13K were process by the task. ### How to reproduce - associate 18-20K pull requests with a DevLake project. - Run dora plugin(specifically lead time for changes sub-task) - confirm that not all pull requests are updated with metrics. ### Anything else _No response_ ### Version v0.19.0-beta6 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org