zahorniak opened a new pull request, #8912:
URL: https://github.com/apache/devlake/pull/8912

   ### ⚠️ Pre Checklist
   
   - [x] I have read through the [Contributing 
Documentation](https://devlake.apache.org/community/).
   - [x] I have added relevant tests.
   - [ ] I have added relevant documentation. <!-- N/A: bug fix, no 
behavior/config docs affected -->
   - [x] I will add labels to the PR, such as `pr-type/bug-fix`.
   
   ### Summary
   
   The CircleCI `collectJobs` subtask runs two API collectors that share the 
URL template `/v2/workflow/{{ .Input.Id }}/job`:
   
   - The **new-records** collector selects `id` from 
`_tool_circleci_workflows`, so `.Input.Id` is populated and the URL is 
well-formed.
   - The **"unfinished details"** collector (which re-collects jobs for 
workflows that are still in flight) selected `DISTINCT workflow_id` from 
`_tool_circleci_jobs` and scanned each row into a `models.CircleciJob`. Since 
the selected column is `workflow_id`, the scan filled `CircleciJob.WorkflowId` 
and left `CircleciJob.Id` **empty** — so the shared template collapsed to 
`/v2/workflow//job`, which the CircleCI API answers with **HTTP 500**.
   
   This triggers whenever a job is in a non-terminal status (`running` / 
`not_running` / `queued` / `on_hold`) — e.g. an approval-gated workflow sitting 
`on_hold`. The 500 aborts the whole CircleCI collection, so all CI/CD data (and 
the DORA metrics derived from it) goes stale.
   
   #### Fix
   
   Alias the projection to `DISTINCT workflow_id AS id` so the scanned 
`CircleciJob.Id` carries the workflow id, mirroring the new-records collector 
that already selects `id`. This is the minimal change and keeps both collectors 
consistent.
   
   To make the behaviour testable, the "unfinished details" DAL clauses are 
extracted into an exported helper, `tasks.UnfinishedJobsInputClauses(...)`, 
which both the collector and the new test call — so the regression test 
exercises the **production** query rather than a copy of it.
   
   ```go
   // before
   dal.Select("DISTINCT workflow_id")
   // after
   dal.Select("DISTINCT workflow_id AS id")
   ```
   
   #### Tests
   
   Adds `TestCircleciUnfinishedJobsInputIterator` (e2e, real DB). It:
   
   - seeds non-terminal and terminal jobs across two connections (including two 
jobs of the same workflow, to assert DISTINCT, and rows that must be filtered 
out);
   - runs the production query (`tasks.UnfinishedJobsInputClauses`) through the 
real `api.DalCursorIterator` — the same path the collector uses;
   - asserts each yielded `.Id` equals the workflow id, that results are 
DISTINCT per workflow, and that the status/connection filters hold.
   
   The test **fails before the fix** (`.Id` is empty → `["", ""]`) and **passes 
after**. Because it calls the exported production helper, a future revert of 
the `Select` re-breaks the test.
   
   ### Does this close any open issues?
   
   Closes #8907
   
   ### Screenshots
   
   N/A — backend collector fix, no UI changes.
   
   ### Other Information
   
   - **Scope:** one SQL projection string (`workflow_id` → `workflow_id AS id`) 
plus a test-enabling extraction. No domain-layer, schema, migration, or 
snapshot changes; existing CircleCI e2e (Extract/Convert) tests are unaffected.
   - **DCO:** all commits are signed off (`git commit -s`).
   - **Verification:** `go build ./plugins/circleci/...`, `go vet 
./plugins/circleci/...`, and the full `./plugins/circleci/e2e/...` suite pass 
against a real MySQL 8.
   - **Real-world impact:** in a live deployment, ~7,600 workflows matched the 
non-terminal filter on a single collection run — each one producing a 
`/v2/workflow//job` 500 — so the bug reliably breaks CircleCI collection at 
scale, not just in edge cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to