jbsmith7741 opened a new issue, #8948:
URL: https://github.com/apache/devlake/issues/8948

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   
   The CircleCI `collectWorkflows` subtask fails when DevLake calls
   `GET /v2/pipeline/{pipeline_id}/workflow` for a pipeline whose workflow
   endpoint returns HTTP 500 from CircleCI, even though `GET /v2/pipeline/{id}`
   returns valid metadata (200 OK with a populated body).
   
   The subtask retries three times, then aborts. Because `collectWorkflows` runs
   before `collectJobs`, **no workflows or jobs are collected for the entire
   project** on that run — CI/CD and DORA metrics go stale.
   
   #### Environment
   
   | | |
   | --- | --- |
   | DevLake version | `v1.0.3-beta12` |
   | Plugin | `circleci` |
   | Database | MySQL 8.x |
   | CircleCI deployment type | **CircleCI Server** (self-hosted) — **not** 
CircleCI Cloud |
   | CircleCI Server version | 4.9.4 |
   | Trigger | Project blueprint data collection (full or incremental) |
   
   > **Note on CircleCI Server vs Cloud:** This bug is reproducible on a
   > self-hosted **CircleCI Server** instance. It has not been verified on
   > CircleCI Cloud (`circleci.com`), but the DevLake code path is identical
   > for both. CircleCI Server exposes the same `/v2/` API surface; the broken
   > workflow endpoint behaviour described here may be specific to self-hosted
   > versions where individual pipeline records can become corrupt or stuck.
   
   #### Error / logs
   
   ```
   subtask collectWorkflows ended unexpectedly
   caused by: Retry exceeded 3 times calling 
/v2/pipeline/<pipeline-id>/workflow.
   The last error was: Http DoAsync error calling [method:GET 
path:/v2/pipeline/<pipeline-id>/workflow query:map[]].
   Response: {"message":"Internal Server Error"} (500)
   ```
   
   
   #### Reproduced against CircleCI Server API directly
   
   Pipeline metadata succeeds:
   
   ```bash
   curl -s -H "Circle-Token: $TOKEN" \
     "https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>"
   # Returns: 
{"id":"<pipeline-id>","errors":[],"project_slug":"gh/<org>/<repo>","state":"created",
 ...}
   ```
   
   Workflow list fails with 500:
   
   ```bash
   curl -s -H "Circle-Token: $TOKEN" \
     "https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>/workflow"
   # Returns: {"message":"An internal server error occurred."} (HTTP 500)
   ```
   
   The same pipeline ID returns 200 on the `/pipeline/{id}` endpoint but 500 on
   `/pipeline/{id}/workflow`. This is a CircleCI Server-side condition (corrupt
   or stuck pipeline record) that DevLake cannot prevent, but must handle
   gracefully.
   
   #### Affected pipeline (example shape)
   
   | Field | Value |
   | --- | --- |
   | Pipeline ID | `<uuid>` (valid, returned by `/project/{slug}/pipeline` 
pagination) |
   | Project | `gh/<org>/<repo>` |
   | State | `created` (stuck — workflows never materialized) |
   | Trigger | Webhook / pull request event |
   
   Pipelines in `created` state with no workflows are a known occurrence on
   CircleCI Server when a webhook fires but the server fails to create workflow
   records internally.
   
   #### Root cause (DevLake side)
   
   `collectWorkflows` iterates **every row** in `_tool_circleci_pipelines` for 
the
   project on full sync (no `SyncPolicy.TimeAfter` filter on the DB query). For 
each
   row it calls the workflow API. The plugin only skips **404** responses via
   `ignoreDeletedBuilds` in `shared.go` — **500 is not skipped**, so one bad
   pipeline record kills the subtask.
   
   This is **not** the same bug as 
[#8907](https://github.com/apache/devlake/issues/8907)
   (empty workflow ID → `/v2/workflow//job` 500 in `collectJobs`).
   
   #### Root cause (CircleCI Server side)
   
   For at least one pipeline, CircleCI Server returns 500 on the workflow 
endpoint
   while pipeline metadata is available — likely a corrupt or stuck pipeline 
record
   on the server (state `created` since creation, workflows never materialized).
   This has been observed on a self-hosted CircleCI Server instance. It is 
unclear
   whether CircleCI Cloud can produce this condition.
   
   
   ### What do you expect to happen
   
   
   1. When CircleCI returns **404** or **500** for a single pipeline's workflow
      endpoint, DevLake should **log and skip** that pipeline and continue 
collecting
      workflows for the rest of the project.
   2. `collectWorkflows` should respect the blueprint **Data Time Range**
      (`SyncPolicy.TimeAfter`) when choosing which `_tool_circleci_pipelines` 
rows
      to iterate, so full sync does not call the workflow API for every 
historical
      pipeline row ever stored in the tool table.
   3. A single bad pipeline on CircleCI Server should not block CI/CD collection
      for an entire project.
   
   ### How to reproduce
   
   
   1. Configure a CircleCI connection pointing at a **CircleCI Server** 
instance.
   2. Ensure `_tool_circleci_pipelines` contains at least one pipeline ID where
      `GET /v2/pipeline/{id}` returns 200 but
      `GET /v2/pipeline/{id}/workflow` returns 500.
      - These are typically pipelines in `created` state with no associated 
workflows,
        caused by a failed or corrupt webhook trigger on the server.
   3. Run CircleCI data collection for that project (full sync is the most 
reliable
      trigger because `collectWorkflows` iterates all DB pipeline rows without a
      time filter).
   4. Observe `collectWorkflows` fail with retry-exceeded 500; `collectJobs` and
      downstream converters do not run for the entire project.
   
   **To find candidate pipelines on your CircleCI Server instance:**
   
   ```bash
   # List project pipelines and look for state=created with no items in 
/workflow
   curl -s -H "Circle-Token: $TOKEN" \
     "https://<your-circleci-server>/api/v2/project/gh/<org>/<repo>/pipeline" \
     | jq '.items[] | select(.state=="created") | .id'
   
   # Then test each candidate:
   curl -s -H "Circle-Token: $TOKEN" \
     "https://<your-circleci-server>/api/v2/pipeline/<candidate-id>/workflow"
   ```
   
   
   ### Anything else
   
   
   #### Operator workaround (per pipeline)
   
   Delete the bad pipeline row from the tool table, then re-sync:
   
   ```sql
   DELETE FROM _tool_circleci_pipelines
   WHERE id = '<pipeline-id-returning-500>';
   ```
   
   This is not durable — new bad records or full-sync iteration over remaining
   historical rows can trigger the same failure again.
   
   #### Related issues (not duplicates)
   
   | Issue | Relationship |
   | --- | --- |
   | [#7797](https://github.com/apache/devlake/issues/7797) | 
`collectWorkflows` 404 after retention; time-range fix on `collectPipelines` 
only — closed |
   | [#8907](https://github.com/apache/devlake/issues/8907) | 
`/v2/workflow//job` 500 in **`collectJobs`** (empty workflow ID) — closed, 
[#8912](https://github.com/apache/devlake/pull/8912) |
   | [#8309](https://github.com/apache/devlake/issues/8309) | Malformed 
workflow JSON in **convert** phase — closed |
   
   No open issue covers **500 on `/v2/pipeline/{valid-id}/workflow`** in
   `collectWorkflows`.
   
   #### Frequency
   
   Occurs whenever collection reaches a pipeline with a broken workflow 
endpoint.
   On full sync over a project with an extended pipeline history in
   `_tool_circleci_pipelines`, the probability of hitting such a record 
increases
   significantly. Projects that have been active for over a year are most at 
risk.
   
   ### Version
   
   v1.0.3-beta12
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to