elijah-roberts opened a new issue, #6612:
URL: https://github.com/apache/incubator-devlake/issues/6612

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   We have a large github repo that has a lot of activity 
(commits/branches/pull requests)and Pull request metrics were not generated for 
all known pull requests. This was approximately 21K pull requests.
   
   The repo in question has a lot of pull requests both historically and on an 
ongoing basis.
   
   
![image](https://github.com/apache/incubator-devlake/assets/19742213/d6602ce2-79df-4c9c-a3b4-84834de0df4e)
   
   We initially started with 18 months of data (starting the middle of 2022) 
and would only get metrics though July of 2023. By removing pull requests prior 
to 2023 (8K) we were able to parse through October 2023.
   
   Here is an example of the distribution:
   
   ```
   select count(id), MONTH(created_date) as month from
       pull_requests group by month;
   ```
   <img width="462" alt="12 rows" 
src="https://github.com/apache/incubator-devlake/assets/19742213/c66353b0-8b07-4bc1-83bf-7f534dad9f80";>
   
   ```
   select count(number), MONTH(github_created_at) month from 
_tool_github_pull_requests Where connection_id = 3 GROUP BY month;
   ```
   <img width="492" alt="12 rows v" 
src="https://github.com/apache/incubator-devlake/assets/19742213/559533ef-631c-494d-9005-c2e550144d30";>
   
   
   We were able to further mitigate our issue be reducing the the amount of 
pull requests down to six months of data:
   
![image](https://github.com/apache/incubator-devlake/assets/19742213/5c49c211-1a9c-4a07-94a1-9feaebc4fa16)
   
   Doing this and re-running the dora task for lead time for changes correctly 
populated the pull request metrics for all know pull request.
   
   But this is a short term fix because as we continue to load data we will 
eventually hit the same limit.
   
   When reviewing logs for the dora metric task there are no errors reported, 
it just appears that at a certain point the task stops processing data. The 
only thing I can figure is that there is some type of limit on the cursor 
object preventing it from loading all of the rows: 
https://github.com/apache/incubator-devlake/blob/863863aa5618705f40957c21be099b6c84de62cb/backend/plugins/dora/tasks/change_lead_time_calculator.go#L51
   
   Example logs:
   
   ```
   time="2023-12-02 17:29:49" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13394"
   time="2023-12-02 17:29:50" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13395"
   time="2023-12-02 17:29:50" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31885 is nil\n"
   time="2023-12-02 17:29:50" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13396"
   time="2023-12-02 17:29:51" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13397"
   time="2023-12-02 17:29:51" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13398"
   time="2023-12-02 17:29:52" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13399"
   time="2023-12-02 17:29:52" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] [*crossdomain.ProjectPrMetric] 
batch save flush total 100 records to database"
   time="2023-12-02 17:29:52" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13400"
   time="2023-12-02 17:29:53" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13401"
   time="2023-12-02 17:29:53" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13402"
   time="2023-12-02 17:29:54" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13403"
   time="2023-12-02 17:29:54" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13404"
   time="2023-12-02 17:29:55" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13405"
   time="2023-12-02 17:29:55" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13406"
   time="2023-12-02 17:29:56" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13407"
   time="2023-12-02 17:29:56" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31898 is nil\n"
   time="2023-12-02 17:29:56" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13408"
   time="2023-12-02 17:29:57" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13409"
   time="2023-12-02 17:29:57" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13410"
   time="2023-12-02 17:29:58" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13411"
   time="2023-12-02 17:29:58" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13412"
   time="2023-12-02 17:29:59" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13413"
   time="2023-12-02 17:29:59" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13414"
   time="2023-12-02 17:30:00" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13415"
   time="2023-12-02 17:30:00" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13416"
   time="2023-12-02 17:30:01" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13417"
   time="2023-12-02 17:30:01" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13418"
   time="2023-12-02 17:30:02" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31911 is nil\n"
   time="2023-12-02 17:30:02" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13419"
   time="2023-12-02 17:30:02" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13420"
   time="2023-12-02 17:30:03" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13421"
   time="2023-12-02 17:30:03" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31914 is nil\n"
   time="2023-12-02 17:30:03" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13422"
   time="2023-12-02 17:30:04" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13423"
   time="2023-12-02 17:30:04" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13424"
   time="2023-12-02 17:30:05" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13425"
   time="2023-12-02 17:30:05" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31918 is nil\n"
   time="2023-12-02 17:30:05" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13426"
   time="2023-12-02 17:30:06" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31920 is nil\n"
   time="2023-12-02 17:30:06" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13427"
   time="2023-12-02 17:30:07" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13428"
   time="2023-12-02 17:30:07" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13429"
   time="2023-12-02 17:30:08" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13430"
   time="2023-12-02 17:30:08" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31924 is nil\n"
   time="2023-12-02 17:30:08" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13431"
   time="2023-12-02 17:30:09" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13432"
   time="2023-12-02 17:30:09" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13433"
   time="2023-12-02 17:30:10" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13434"
   time="2023-12-02 17:30:10" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13435"
   time="2023-12-02 17:30:11" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31929 is nil\n"
   time="2023-12-02 17:30:11" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13436"
   time="2023-12-02 17:30:11" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13437"
   time="2023-12-02 17:30:12" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13438"
   time="2023-12-02 17:30:12" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13439"
   time="2023-12-02 17:30:13" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31934 is nil\n"
   time="2023-12-02 17:30:13" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13440"
   time="2023-12-02 17:30:13" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13441"
   time="2023-12-02 17:30:14" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] deploy time of pr 31936 is nil\n"
   time="2023-12-02 17:30:14" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13442"
   time="2023-12-02 17:30:14" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13443"
   time="2023-12-02 17:30:15" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13444"
   time="2023-12-02 17:30:15" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13445"
   time="2023-12-02 17:30:16" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] finished records: 13446"
   time="2023-12-02 17:30:16" level=debug msg=" [pipeline service] [pipeline 
#94] [task #1317] [calculateChangeLeadTime] [*crossdomain.ProjectPrMetric] 
batch save flush total 46 records to database"
   time="2023-12-02 17:30:16" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] finished step: 1 / 2"
   time="2023-12-02 17:30:16" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] executing subtask ConnectIncidentToDeployment"
   time="2023-12-02 17:30:16" level=info msg=" [pipeline service] [pipeline 
#94] [task #1317] [ConnectIncidentToDeployment] finished records: 1"
   ```
   
   Note there is a custom debug log that we added so we could keep track of 
which specific PRs do not have an associated deployment. This also helped us 
understand what was one of the last PR's processed for metrics. 
   
   ### What do you expect to happen
   
   I expected pull request metrics to be populated for all  pull requests that 
match this query:
   
   
https://github.com/apache/incubator-devlake/blob/main/backend/plugins/dora/tasks/change_lead_time_calculator.go#L52-L55
   
   When run manually against the database 19K rows come back, but only 13K were 
process by the task.
   
   ### How to reproduce
   
   - associate 18-20K pull requests with a DevLake project.
   - Run dora plugin(specifically lead time for changes sub-task)
   - confirm that not all pull requests are updated with metrics.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   v0.19.0-beta6
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to