danielemoraschi opened a new issue, #8814: URL: https://github.com/apache/incubator-devlake/issues/8814
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened When two projects share the same GitHub repository, running the pipeline for one project deletes `pull_request_issues` records created by the other project's pipeline. ### What do you expect to happen **Setup:** - Project A: Jira board with `AAA-xxx` tickets, linked to GitHub repos R1, R2 - Project B: Jira board with `BBB-xxx` tickets, linked to GitHub repos R1, R3 - Shared scope config regex: `(?i)((?:AAA|BBB)[- ][0-9]+)` - Repo R1 is shared between both projects **Steps to reproduce:** 1. Run Project A's pipeline — `pull_request_issues` are correctly created for AAA tickets 2. Run Project B's pipeline — `pull_request_issues` are correctly created for BBB tickets, but the AAA links from step 1 are deleted 3. Re-run Project A's pipeline — AAA links are recreated, but BBB links from step 2 are deleted Only the last-run project retains its PR-issue links. ## Root cause The bug is in `backend/plugins/linker/tasks/link_pr_and_issue.go`, function `clearHistoryData()` (lines 49-62): ```go func clearHistoryData(db dal.Dal, data *LinkerTaskData) errors.Error { sql := ` DELETE FROM pull_request_issues WHERE pull_request_id IN ( SELECT pr.id FROM pull_requests pr LEFT JOIN project_mapping pm ON pm.table = 'repos' AND pm.row_id = pr.base_repo_id AND pm.project_name = ? ) ` return db.Exec(sql, data.Options.ProjectName) } ``` **Two issues:** ### 1. LEFT JOIN makes the delete effectively global The `project_name = ?` filter is in the `ON` clause of a `LEFT JOIN`. With a LEFT JOIN, unmatched rows still appear in the result (with NULLs for the right side). This means the subquery returns **every PR ID in the system**, not just those belonging to the current project's repos. The delete therefore wipes the entire `pull_request_issues` table. Compare with the creation query on lines 73-76 which correctly uses `LEFT JOIN` + `WHERE`: ```go dal.Join("LEFT JOIN project_mapping pm ON (pm.table = 'repos' AND pm.row_id = pull_requests.base_repo_id)"), dal.Where("pm.project_name = ?", data.Options.ProjectName), ``` ### 2. No issue-side or source scoping Even if the JOIN were fixed, the delete would still remove: - Links created by **other projects** that share the same repos (since it only filters on the PR side, not the issue side) - Links created by the **GitHub converter** (`ConvertPullRequestIssues`), which writes to the same table with different `_raw_data_table`/`_raw_data_params` values ## Suggested fix Replace `clearHistoryData()` with a properly scoped version: ```go func clearHistoryData(db dal.Dal, data *LinkerTaskData) errors.Error { sql := ` DELETE FROM pull_request_issues WHERE pull_request_id IN ( SELECT pr.id FROM pull_requests pr INNER JOIN project_mapping pm ON pm.table = 'repos' AND pm.row_id = pr.base_repo_id WHERE pm.project_name = ? ) AND issue_id IN ( SELECT bi.issue_id FROM board_issues bi INNER JOIN project_mapping pm2 ON pm2.table = 'boards' AND pm2.row_id = bi.board_id WHERE pm2.project_name = ? ) AND (_raw_data_table = '' OR _raw_data_table IS NULL) AND _raw_data_remark LIKE '%pull_requests,%' ` return db.Exec(sql, data.Options.ProjectName, data.Options.ProjectName) } ``` This adds 3 scoping conditions: 1. **INNER JOIN + WHERE** for `project_name` (fixes the LEFT JOIN bug) 2. **Issue in current project's boards** (prevents deleting other projects' links) 3. **Linker-created rows only** via `_raw_data_table`/`_raw_data_remark` (prevents deleting GitHub converter rows) ## What I'd expect to happen Running the pipeline for one project should only delete and recreate `pull_request_issues` records that were created by that project's linker task. Links from other projects and from the GitHub converter should be preserved. ### How to reproduce 1. Create two projects that share at least one GitHub repo 2. Configure both with Jira boards containing different issue key prefixes 3. Set a shared PR-to-issue regex that matches both prefixes 4. Ensure PRs in the shared repo reference issue keys from both projects 5. Run pipeline for Project A — verify links exist 6. Run pipeline for Project B — observe that Project A's links are deleted ### Anything else Reproducible on current `main` branch. The bug was introduced in commit `a4cb023ba` (May 2024, "Clear history data when running linker"). The existing e2e test (`backend/plugins/linker/e2e/link_pr_and_issue_test.go`) only covers a single project, so the multi-project regression is not caught. Verification queries: ```sql -- Check PR-issue links for a specific issue SELECT pr.id, pr.pull_request_key, pr.base_repo_id FROM pull_request_issues pri JOIN pull_requests pr ON pri.pull_request_id = pr.id WHERE pri.issue_id = 'jira:JiraIssue:1:<issue_id>'; -- Check raw data provenance of links SELECT pull_request_id, issue_id, _raw_data_table, _raw_data_remark FROM pull_request_issues WHERE pull_request_id LIKE '%<repo_id>%'; ### Version main ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
