danielemoraschi opened a new issue, #8814:
URL: https://github.com/apache/incubator-devlake/issues/8814

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   When two projects share the same GitHub repository, running the pipeline for 
one project deletes `pull_request_issues` records created by the other 
project's pipeline.
   
   
   ### What do you expect to happen
   
   **Setup:**
   - Project A: Jira board with `AAA-xxx` tickets, linked to GitHub repos R1, R2
   - Project B: Jira board with `BBB-xxx` tickets, linked to GitHub repos R1, R3
   - Shared scope config regex: `(?i)((?:AAA|BBB)[- ][0-9]+)`
   - Repo R1 is shared between both projects
   
   **Steps to reproduce:**
   1. Run Project A's pipeline — `pull_request_issues` are correctly created 
for AAA tickets
   2. Run Project B's pipeline — `pull_request_issues` are correctly created 
for BBB tickets, but the AAA links from step 1 are deleted
   3. Re-run Project A's pipeline — AAA links are recreated, but BBB links from 
step 2 are deleted
   
   Only the last-run project retains its PR-issue links.
   
   ## Root cause
   
   The bug is in `backend/plugins/linker/tasks/link_pr_and_issue.go`, function 
`clearHistoryData()` (lines 49-62):
   
   ```go
   func clearHistoryData(db dal.Dal, data *LinkerTaskData) errors.Error {
        sql := `
        DELETE FROM pull_request_issues
                WHERE pull_request_id IN (
                        SELECT pr.id
                                FROM pull_requests pr
                                        LEFT JOIN project_mapping pm
                                        ON pm.table = 'repos'
                                                AND pm.row_id = pr.base_repo_id
                                                AND pm.project_name = ?
        )
   `
        return db.Exec(sql, data.Options.ProjectName)
   }
   ```
   
   **Two issues:**
   
   ### 1. LEFT JOIN makes the delete effectively global
   
   The `project_name = ?` filter is in the `ON` clause of a `LEFT JOIN`. With a 
LEFT JOIN, unmatched rows still appear in the result (with NULLs for the right 
side). This means the subquery returns **every PR ID in the system**, not just 
those belonging to the current project's repos. The delete therefore wipes the 
entire `pull_request_issues` table.
   
   Compare with the creation query on lines 73-76 which correctly uses `LEFT 
JOIN` + `WHERE`:
   ```go
   dal.Join("LEFT JOIN project_mapping pm ON (pm.table = 'repos' AND pm.row_id 
= pull_requests.base_repo_id)"),
   dal.Where("pm.project_name = ?", data.Options.ProjectName),
   ```
   
   ### 2. No issue-side or source scoping
   
   Even if the JOIN were fixed, the delete would still remove:
   - Links created by **other projects** that share the same repos (since it 
only filters on the PR side, not the issue side)
   - Links created by the **GitHub converter** (`ConvertPullRequestIssues`), 
which writes to the same table with different 
`_raw_data_table`/`_raw_data_params` values
   
   ## Suggested fix
   
   Replace `clearHistoryData()` with a properly scoped version:
   
   ```go
   func clearHistoryData(db dal.Dal, data *LinkerTaskData) errors.Error {
        sql := `
        DELETE FROM pull_request_issues
                WHERE pull_request_id IN (
                        SELECT pr.id
                                FROM pull_requests pr
                                        INNER JOIN project_mapping pm
                                        ON pm.table = 'repos'
                                                AND pm.row_id = pr.base_repo_id
                                WHERE pm.project_name = ?
                )
                AND issue_id IN (
                        SELECT bi.issue_id
                                FROM board_issues bi
                                        INNER JOIN project_mapping pm2
                                        ON pm2.table = 'boards'
                                                AND pm2.row_id = bi.board_id
                                WHERE pm2.project_name = ?
                )
                AND (_raw_data_table = '' OR _raw_data_table IS NULL)
                AND _raw_data_remark LIKE '%pull_requests,%'
   `
        return db.Exec(sql, data.Options.ProjectName, data.Options.ProjectName)
   }
   ```
   
   This adds 3 scoping conditions:
   1. **INNER JOIN + WHERE** for `project_name` (fixes the LEFT JOIN bug)
   2. **Issue in current project's boards** (prevents deleting other projects' 
links)
   3. **Linker-created rows only** via `_raw_data_table`/`_raw_data_remark` 
(prevents deleting GitHub converter rows)
   
   
   ## What I'd expect to happen
   
   Running the pipeline for one project should only delete and recreate 
`pull_request_issues` records that were created by that project's linker task. 
Links from other projects and from the GitHub converter should be preserved.
   
   ### How to reproduce
   
   1. Create two projects that share at least one GitHub repo
   2. Configure both with Jira boards containing different issue key prefixes
   3. Set a shared PR-to-issue regex that matches both prefixes
   4. Ensure PRs in the shared repo reference issue keys from both projects
   5. Run pipeline for Project A — verify links exist
   6. Run pipeline for Project B — observe that Project A's links are deleted
   
   ### Anything else
   
   Reproducible on current `main` branch. The bug was introduced in commit 
`a4cb023ba` (May 2024, "Clear history data when running linker").
   
   The existing e2e test 
(`backend/plugins/linker/e2e/link_pr_and_issue_test.go`) only covers a single 
project, so the multi-project regression is not caught.
   
   Verification queries:
   
   ```sql
   -- Check PR-issue links for a specific issue
   SELECT pr.id, pr.pull_request_key, pr.base_repo_id
   FROM pull_request_issues pri
   JOIN pull_requests pr ON pri.pull_request_id = pr.id
   WHERE pri.issue_id = 'jira:JiraIssue:1:<issue_id>';
   
   -- Check raw data provenance of links
   SELECT pull_request_id, issue_id, _raw_data_table, _raw_data_remark
   FROM pull_request_issues
   WHERE pull_request_id LIKE '%<repo_id>%';
   
   ### Version
   
   main
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to