jbsmith7741 opened a new issue, #8948: URL: https://github.com/apache/devlake/issues/8948
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened The CircleCI `collectWorkflows` subtask fails when DevLake calls `GET /v2/pipeline/{pipeline_id}/workflow` for a pipeline whose workflow endpoint returns HTTP 500 from CircleCI, even though `GET /v2/pipeline/{id}` returns valid metadata (200 OK with a populated body). The subtask retries three times, then aborts. Because `collectWorkflows` runs before `collectJobs`, **no workflows or jobs are collected for the entire project** on that run — CI/CD and DORA metrics go stale. #### Environment | | | | --- | --- | | DevLake version | `v1.0.3-beta12` | | Plugin | `circleci` | | Database | MySQL 8.x | | CircleCI deployment type | **CircleCI Server** (self-hosted) — **not** CircleCI Cloud | | CircleCI Server version | 4.9.4 | | Trigger | Project blueprint data collection (full or incremental) | > **Note on CircleCI Server vs Cloud:** This bug is reproducible on a > self-hosted **CircleCI Server** instance. It has not been verified on > CircleCI Cloud (`circleci.com`), but the DevLake code path is identical > for both. CircleCI Server exposes the same `/v2/` API surface; the broken > workflow endpoint behaviour described here may be specific to self-hosted > versions where individual pipeline records can become corrupt or stuck. #### Error / logs ``` subtask collectWorkflows ended unexpectedly caused by: Retry exceeded 3 times calling /v2/pipeline/<pipeline-id>/workflow. The last error was: Http DoAsync error calling [method:GET path:/v2/pipeline/<pipeline-id>/workflow query:map[]]. Response: {"message":"Internal Server Error"} (500) ``` #### Reproduced against CircleCI Server API directly Pipeline metadata succeeds: ```bash curl -s -H "Circle-Token: $TOKEN" \ "https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>" # Returns: {"id":"<pipeline-id>","errors":[],"project_slug":"gh/<org>/<repo>","state":"created", ...} ``` Workflow list fails with 500: ```bash curl -s -H "Circle-Token: $TOKEN" \ "https://<your-circleci-server>/api/v2/pipeline/<pipeline-id>/workflow" # Returns: {"message":"An internal server error occurred."} (HTTP 500) ``` The same pipeline ID returns 200 on the `/pipeline/{id}` endpoint but 500 on `/pipeline/{id}/workflow`. This is a CircleCI Server-side condition (corrupt or stuck pipeline record) that DevLake cannot prevent, but must handle gracefully. #### Affected pipeline (example shape) | Field | Value | | --- | --- | | Pipeline ID | `<uuid>` (valid, returned by `/project/{slug}/pipeline` pagination) | | Project | `gh/<org>/<repo>` | | State | `created` (stuck — workflows never materialized) | | Trigger | Webhook / pull request event | Pipelines in `created` state with no workflows are a known occurrence on CircleCI Server when a webhook fires but the server fails to create workflow records internally. #### Root cause (DevLake side) `collectWorkflows` iterates **every row** in `_tool_circleci_pipelines` for the project on full sync (no `SyncPolicy.TimeAfter` filter on the DB query). For each row it calls the workflow API. The plugin only skips **404** responses via `ignoreDeletedBuilds` in `shared.go` — **500 is not skipped**, so one bad pipeline record kills the subtask. This is **not** the same bug as [#8907](https://github.com/apache/devlake/issues/8907) (empty workflow ID → `/v2/workflow//job` 500 in `collectJobs`). #### Root cause (CircleCI Server side) For at least one pipeline, CircleCI Server returns 500 on the workflow endpoint while pipeline metadata is available — likely a corrupt or stuck pipeline record on the server (state `created` since creation, workflows never materialized). This has been observed on a self-hosted CircleCI Server instance. It is unclear whether CircleCI Cloud can produce this condition. ### What do you expect to happen 1. When CircleCI returns **404** or **500** for a single pipeline's workflow endpoint, DevLake should **log and skip** that pipeline and continue collecting workflows for the rest of the project. 2. `collectWorkflows` should respect the blueprint **Data Time Range** (`SyncPolicy.TimeAfter`) when choosing which `_tool_circleci_pipelines` rows to iterate, so full sync does not call the workflow API for every historical pipeline row ever stored in the tool table. 3. A single bad pipeline on CircleCI Server should not block CI/CD collection for an entire project. ### How to reproduce 1. Configure a CircleCI connection pointing at a **CircleCI Server** instance. 2. Ensure `_tool_circleci_pipelines` contains at least one pipeline ID where `GET /v2/pipeline/{id}` returns 200 but `GET /v2/pipeline/{id}/workflow` returns 500. - These are typically pipelines in `created` state with no associated workflows, caused by a failed or corrupt webhook trigger on the server. 3. Run CircleCI data collection for that project (full sync is the most reliable trigger because `collectWorkflows` iterates all DB pipeline rows without a time filter). 4. Observe `collectWorkflows` fail with retry-exceeded 500; `collectJobs` and downstream converters do not run for the entire project. **To find candidate pipelines on your CircleCI Server instance:** ```bash # List project pipelines and look for state=created with no items in /workflow curl -s -H "Circle-Token: $TOKEN" \ "https://<your-circleci-server>/api/v2/project/gh/<org>/<repo>/pipeline" \ | jq '.items[] | select(.state=="created") | .id' # Then test each candidate: curl -s -H "Circle-Token: $TOKEN" \ "https://<your-circleci-server>/api/v2/pipeline/<candidate-id>/workflow" ``` ### Anything else #### Operator workaround (per pipeline) Delete the bad pipeline row from the tool table, then re-sync: ```sql DELETE FROM _tool_circleci_pipelines WHERE id = '<pipeline-id-returning-500>'; ``` This is not durable — new bad records or full-sync iteration over remaining historical rows can trigger the same failure again. #### Related issues (not duplicates) | Issue | Relationship | | --- | --- | | [#7797](https://github.com/apache/devlake/issues/7797) | `collectWorkflows` 404 after retention; time-range fix on `collectPipelines` only — closed | | [#8907](https://github.com/apache/devlake/issues/8907) | `/v2/workflow//job` 500 in **`collectJobs`** (empty workflow ID) — closed, [#8912](https://github.com/apache/devlake/pull/8912) | | [#8309](https://github.com/apache/devlake/issues/8309) | Malformed workflow JSON in **convert** phase — closed | No open issue covers **500 on `/v2/pipeline/{valid-id}/workflow`** in `collectWorkflows`. #### Frequency Occurs whenever collection reaches a pipeline with a broken workflow endpoint. On full sync over a project with an extended pipeline history in `_tool_circleci_pipelines`, the probability of hitting such a record increases significantly. Projects that have been active for over a year are most at risk. ### Version v1.0.3-beta12 ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
