AnkeshThakur opened a new issue, #8361:
URL: https://github.com/apache/incubator-devlake/issues/8361

   <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
   -->
   
   ## Question
   We have been using devlake for a while and we are increasing its scope to 
monitor many of our projects. With increased usage, we are hit with very long 
Dora calculation step. Total process takes around 10+ hours to complete. Logs 
says dora's `calculateChangeLeadTime` is taking a lot of time (log snip is 
below). 
   
   Is there any known way to improve it significantly? Have added few indexes 
but they did not help. Maybe more indexes or some parameters to boost 
concurrency?
   
   ## Screenshots
   If applicable, add screenshots to help explain.
   
   ## Additional context
   
   Slow query log is showing that these tasks are processing one data point at 
a time and scan significantly high number of rows. Adding index to relevant 
columns improves efficiency by ~25%. However, overall process still takes good 
amount of time which makes it unsuitable for daily data refresh because during 
this task no data is available in `project_pr_metrics` and other important 
tables.
   Sample slow query logs:
   
   ```
   # Time: 2025-03-25T10:26:54.420337Z
   # User@Host: ########[#####] @  [##.##.###.##]  Id: 17460
   # Query_time: 10.153949  Lock_time: 0.000004 Rows_sent: 1  Rows_examined: 
314841
   SET timestamp=1742898404;
   SELECT dc.* FROM cicd_deployment_commits dc LEFT JOIN 
cicd_deployment_commits p ON (dc.prev_success_deployment_commit_id = p.id) LEFT 
JOIN project_mapping pm ON (pm.table = 'cicd_scopes' AND pm.row_id = 
dc.cicd_scope_id) INNER JOIN commits_diffs cd ON (cd.new_commit_sha = 
dc.commit_sha AND cd.old_commit_sha = COALESCE (p.commit_sha, '')) WHERE 
dc.prev_success_deployment_commit_id <> '' AND dc.environment = 'PRODUCTION' 
AND (pm.project_name = '#####' AND cd.commit_sha = 
'csrffa028d4c5c85341f861beb685fba4e115e65' AND dc.RESULT = 'SUCCESS') ORDER BY 
dc.started_date, dc.id ASC LIMIT 1;
   ```
   
   Log of `calculateChangeLeadTime`
   ```
   [GIN] 2025/03/25 - 13:41:54 | 200 |        2.83µs |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:41:55" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1699(not 
exactly)"
   time="2025-03-25 13:41:58" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1701(not 
exactly)"
   [GIN] 2025/03/25 - 13:41:59 | 200 |        2.35µs |   10.221.51.137 | GET    
  "/ping"
   [GIN] 2025/03/25 - 13:41:59 | 200 |        1.34µs |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:42:02" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1703(not 
exactly)"
   [GIN] 2025/03/25 - 13:42:04 | 200 |        2.62µs |   10.221.51.137 | GET    
  "/ping"
   [GIN] 2025/03/25 - 13:42:04 | 200 |         880ns |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:42:06" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1705(not 
exactly)"
   [GIN] 2025/03/25 - 13:42:09 | 200 |        2.69µs |   10.221.51.137 | GET    
  "/ping"
   [GIN] 2025/03/25 - 13:42:09 | 200 |         960ns |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:42:10" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1707(not 
exactly)"
   time="2025-03-25 13:42:13" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1709(not 
exactly)"
   [GIN] 2025/03/25 - 13:42:14 | 200 |        2.63µs |   10.221.51.137 | GET    
  "/ping"
   [GIN] 2025/03/25 - 13:42:14 | 200 |         880ns |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:42:16" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1711(not 
exactly)"
   [GIN] 2025/03/25 - 13:42:19 | 200 |        2.35µs |   10.221.51.137 | GET    
  "/ping"
   [GIN] 2025/03/25 - 13:42:19 | 200 |         870ns |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:42:20" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1713(not 
exactly)"
   time="2025-03-25 13:42:23" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1715(not 
exactly)"
   [GIN] 2025/03/25 - 13:42:24 | 200 |        2.41µs |   10.221.51.137 | GET    
  "/ping"
   [GIN] 2025/03/25 - 13:42:24 | 200 |        1.28µs |   10.221.51.137 | GET    
  "/ping"
   time="2025-03-25 13:42:26" level=info msg=" [pipeline service] [pipeline 
#144] [task #2081] [calculateChangeLeadTime] finished records: 1717(not 
exactly)"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@devlake.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to