jcg-juan opened a new issue, #8571: URL: https://github.com/apache/incubator-devlake/issues/8571
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened We’ve identified a data inconsistency with how DevLake ingests GitHub Pull Requests. Since around March 2025, all PRs that are merged in GitHub are being stored in DevLake as state = closed and merged = false. This results in: - Merged PRs not being reflected correctly in queries (state = 'MERGED' returns nothing after February). - Metrics such as additions/deletions for merged PRs show up as 0. - Dashboards and reports undercounting merged PR activity. Impact: - Any downstream metrics that rely on merged PRs (e.g., code change size, lead time, DORA metrics) are currently inaccurate. - Dashboards show zero additions/deletions after Feb 2025. ### What do you expect to happen - DevLake should correctly recognise merged PRs (state = 'MERGED') or, at a minimum, set the merged column to true when GitHub API reports it. - Additions/deletions should be populated for merged PRs the same way as open ones. ### How to reproduce Steps to Reproduce: - Ingest PR data from GitHub into DevLake. - Run the following query to check the latest PRs: ``` SELECT * FROM public._tool_github_pull_requests ORDER BY github_updated_at DESC LIMIT 1000; ``` - Observe that since March 2025, all PRs are either open or closed, never MERGED. - Additionally, **merged** column is always **false**, even when GitHub API reports merged = true. ### Anything else 1. Example GitHub API response for a merged PR (PR #1 before March working as expected): ``` curl -s -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <token>" \ https://api.github.com/repos/<org>/<repo>/pulls/1 \ | jq '{number, state, merged, additions, deletions}' ``` Response: ``` { "number": 1, "state": "closed", "merged": true, "additions": 203, "deletions": 0 } From Devlake DB (table _tool_github_pull_requests): ``` <html><head></head><body> connection_id | github_id | repo_id | number | state | title | created_at | updated_at | closed_at | additions | deletions | changed_files | review_comments | comments | merged | merge_commit_sha | head_ref | base_ref | head_sha | base_sha | url | author | author_id | _created_at | _updated_at | raw_data_origin | raw_table | raw_data_id | org_id | installation_id | commits | is_draft -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 1 | 2019299654 | 842380603 | 1 | MERGED | initial setup | 2024-08-14 13:22:49 | 2024-08-14 13:26:32 | 2024-08-14 13:26:31 | 203 | 0 | 0 | 0 | 0 | false | (redacted) | setup-action | main | (redacted) | (redacted) | (redacted) | dev-user | 132356641 | 2025-03-30 00:09:11.131044 | 2025-03-30 00:09:11.131044 | {"ConnectionId":1,"Name":"…"} | _raw_github_graphql_prs | 171 | (null) | 0 | 0 | false </body></html> 2. Example GitHub API response for a merged PR (PR #73 after March NOT working as expected): ``` curl -s -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer <token>" \ https://api.github.com/repos/<org>/<repo>/pulls/73\ | jq '{number, state, merged, additions, deletions}' ``` Response: ``` { "number": 73, "state": "closed", "merged": true, "additions": 4, "deletions": 1 } ``` From Devlake DB (table _tool_github_pull_requests): <html><head></head><body> connection_id | github_id | repo_id | number | state | title | created_at | updated_at | closed_at | additions | deletions | changed_files | review_comments | comments | merged | merge_commit_sha | head_ref | base_ref | head_sha | base_sha | url | author | author_id | _created_at | _updated_at | raw_data_origin | raw_table | raw_data_id | org_id | installation_id | commits | is_draft -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 2 | 2815676617 | 842380603 | 73 | closed | Cicd usage script fix: check if action enabled | 2025-09-10 14:09:57 | 2025-09-10 16:05:53 | 2025-09-10 16:05:52 | 0 | 0 | 0 | 0 | 0 | false | (redacted) | feature-branch | main | (redacted) | (redacted) | (redacted) | dev-user | 131661649 | 2025-09-12 00:00:49.359316 | 2025-09-12 00:00:49.359316 | {"ConnectionId":2,"Name":"…"} | _raw_github_api_pull_requests | 30757 | (null) | 842380603 | 0 | false </body></html> ### Version v1.0.2 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
