LucianoKratzer opened a new issue, #8689: URL: https://github.com/apache/incubator-devlake/issues/8689
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened ### Description When running the GitHub data collection pipeline, several Pull Requests that were merged or closed on GitHub more than a month ago still retain the `OPEN` status in the DevLake database. This issue occurs consistently across our production and local environments. ### Technical Details - **The Issue:** Even though the `updated_at` field in the database is updated to the timestamp of the last pipeline execution, the `status` field remains stuck as `OPEN`. - **Environment & Volume:** - We deal with a high volume of data (repositories with a large history of PRs and many comments). - The issue has persisted since the first deployment; we haven't been able to achieve a successful status sync for these records. - **Data Evidence:** Out of **932** pull requests analyzed: - **650 PRs** are merged/closed on GitHub but stuck as `OPEN` in DevLake. - **282 PRs** are correctly marked as `OPEN`. - **Version Info:** - Local: `version:dev` - Production: `local_build@unknown_sha` - **Database:** Verified that the DB user has full `READ`, `UPDATE`, and `DELETE` permissions. ### Error Logs While most repositories finish, one specific large repository (`asaasdev/asaas-core`) consistently fails with: ``` github_graphql:asaasdev/asaas-core subtask Collect Pull Requests ended unexpectedly Wraps: (2) | combined messages: | { | graphql query got error | ===================== | graphql query got error | } Error types: (1) *hintdetail.withDetail (2) *errors.errorString. ``` I believe that despite the error in one repository, the status of PRs in other successfully collected repositories should have been synchronized correctly. ### What do you expect to happen I expect the DevLake GitHub plugin to correctly synchronize the `status` of Pull Requests during incremental or full collection. If a Pull Request has been changed to `MERGED` or `CLOSED` on GitHub, the corresponding record in the `pull_requests` table should be updated to reflect that change, ensuring the integrity of the data used for DORA metrics and organizational dashboards. ### How to reproduce ### Steps to Reproduce The issue occurs consistently in our production environment since the initial deployment. There is no specific edge case; it happens across all GitHub data collection pipelines. 1. Set up a GitHub connection with a high volume of data (multiple repositories, large history of PRs and comments). 2. Run the initial collection pipeline. (PRs initially correctly reflect their status). 3. Update several PR statuses on GitHub (e.g., Merge or Close an open PR). 4. Run the collection pipeline again (Incremental or Full). 5. Observe that in the DevLake database, the `updated_at` field changes to the current time, but the `status` remains `OPEN`, failing to reflect the changes made on GitHub. ### Observed Behavior - The issue is persistent across all pipeline executions. - The high volume of data (PRs and comments) might be a contributing factor. - Even with full database permissions, the synchronization of the `status` field fails for approximately 70% of updated PRs. ### Anything else _No response_ ### Version dev / local_build@unknown_sha (built from source) ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
