LucianoKratzer opened a new issue, #8689:
URL: https://github.com/apache/incubator-devlake/issues/8689

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   ### Description
   When running the GitHub data collection pipeline, several Pull Requests that 
were merged or closed on GitHub more than a month ago still retain the `OPEN` 
status in the DevLake database. This issue occurs consistently across our 
production and local environments.
   
   ### Technical Details
   - **The Issue:** Even though the `updated_at` field in the database is 
updated to the timestamp of the last pipeline execution, the `status` field 
remains stuck as `OPEN`.
   - **Environment & Volume:** - We deal with a high volume of data 
(repositories with a large history of PRs and many comments).
     - The issue has persisted since the first deployment; we haven't been able 
to achieve a successful status sync for these records.
   - **Data Evidence:** Out of **932** pull requests analyzed:
     - **650 PRs** are merged/closed on GitHub but stuck as `OPEN` in DevLake.
     - **282 PRs** are correctly marked as `OPEN`.
   - **Version Info:**
     - Local: `version:dev`
     - Production: `local_build@unknown_sha`
   - **Database:** Verified that the DB user has full `READ`, `UPDATE`, and 
`DELETE` permissions.
   
   ### Error Logs
   While most repositories finish, one specific large repository 
(`asaasdev/asaas-core`) consistently fails with:
   ```
   github_graphql:asaasdev/asaas-core
   subtask Collect Pull Requests ended unexpectedly Wraps: (2) | combined 
messages: | { | graphql query got error | ===================== | graphql query 
got error | } Error types: (1) *hintdetail.withDetail (2) *errors.errorString.
   ```
   
   I believe that despite the error in one repository, the status of PRs in 
other successfully collected repositories should have been synchronized 
correctly.
   
   
   ### What do you expect to happen
   
   I expect the DevLake GitHub plugin to correctly synchronize the `status` of 
Pull Requests during incremental or full collection. 
   
   If a Pull Request has been changed to `MERGED` or `CLOSED` on GitHub, the 
corresponding record in the `pull_requests` table should be updated to reflect 
that change, ensuring the integrity of the data used for DORA metrics and 
organizational dashboards.
   
   ### How to reproduce
   
   ### Steps to Reproduce
   The issue occurs consistently in our production environment since the 
initial deployment. There is no specific edge case; it happens across all 
GitHub data collection pipelines.
   
   1. Set up a GitHub connection with a high volume of data (multiple 
repositories, large history of PRs and comments).
   2. Run the initial collection pipeline. (PRs initially correctly reflect 
their status).
   3. Update several PR statuses on GitHub (e.g., Merge or Close an open PR).
   4. Run the collection pipeline again (Incremental or Full).
   5. Observe that in the DevLake database, the `updated_at` field changes to 
the current time, but the `status` remains `OPEN`, failing to reflect the 
changes made on GitHub.
   
   ### Observed Behavior
   - The issue is persistent across all pipeline executions.
   - The high volume of data (PRs and comments) might be a contributing factor.
   - Even with full database permissions, the synchronization of the `status` 
field fails for approximately 70% of updated PRs.
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   dev / local_build@unknown_sha (built from source)
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to