danielemoraschi opened a new issue, #8787: URL: https://github.com/apache/incubator-devlake/issues/8787
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and found no similar issues. ### What happened Clicking the cancel button in the UI (which fires `DELETE /api/pipelines/:id`) always returns "Operation successfully completed" (HTTP 200), but the pipeline continues running. Observable for 30+ minutes after the cancel request before it stops on its own. This is a regression of #5585 and a go-git counterpart to #4188 (which only fixed libgit2). Both were closed but the underlying causes were not fully addressed. Three independent bugs combine to produce this behaviour: --- **Bug 1: `CancelPipeline` silently discards errors from `CancelTask`** `server/services/pipeline.go`: ```go for _, pendingTask := range pendingTasks { _ = CancelTask(pendingTask.ID) // error thrown away } ``` `CancelTask` calls `runningTasks.Remove(taskId)`. If the task is not in the in-memory map (race between pipeline stages, task not yet registered, or pod restart), `Remove` returns `errors.NotFound`. The error is discarded, `cancel()` is never called, the goroutine keeps running, and the API still returns 200 OK. A second consequence: tasks in future pipeline stages (`TASK_CREATED`) are never in `runningTasks`, so `CancelTask` silently fails for all of them. They remain `TASK_CREATED` in the database and the pipeline status stays `TASK_RUNNING` until the goroutine naturally finishes. --- **Bug 2: `storeRepoSnapshot` in gitextractor ignores context cancellation (go-git path)** `plugins/gitextractor/parser/repo_gogit.go`: ```go func (r *GogitRepoCollector) storeRepoSnapshot(subtaskCtx plugin.SubTaskContext, commitList []*object.Commit) error { ctx := subtaskCtx.GetContext() for _, commit := range commitList { // ← no ctx.Done() check between commits // ... for _, p := range patch.Stats() { blameResults, err := gogit.Blame(commit, fileName) // ← no context parameter ``` `gogit.Blame()` has no context parameter, it performs a full in-process blame computation and cannot be interrupted. For large repositories with thousands of commits, each touching many files, this loop runs for **30+ minutes** and is completely unresponsive to context cancellation. This is the primary cause of the long delay observed after pressing cancel. Issue #4188 fixed the same problem for the libgit2 implementation (`repo_libgit2.go`) but the go-git implementation was never addressed. --- **Bug 3: Cancelled tasks are marked `TASK_FAILED` instead of `TASK_CANCELLED`** `core/runner/run_task.go`: the deferred status update always writes `TASK_FAILED` when `err != nil`, with no special case for context cancellation: ```go dbe := db.UpdateColumns(task, []dal.DalSet{ {ColumnName: "status", Value: models.TASK_FAILED}, // wrong for cancellations ... }) ``` The final pipeline status also becomes `TASK_FAILED` or `TASK_PARTIAL` rather than `TASK_CANCELLED`, making it impossible to distinguish a failed run from a cancelled one in the UI or dashboards. ### What do you expect to happen ### What do you expect to happen - Pressing cancel on a running pipeline stops it promptly (within seconds for HTTP-based plugins) - The pipeline and all its tasks (running and not-yet-started) are immediately marked `TASK_CANCELLED` in the database - The API returns a non-200 or a meaningful error if cancellation could not be applied - A cancelled run is distinguishable from a failed run in the UI ### How to reproduce ### How to reproduce **For the 30+ minute hang (Bug 2):** 1. Configure a blueprint with a large git repository (thousands of commits) 2. Trigger the pipeline and wait for `collectCommits` / blame subtask to begin 3. Click cancel 4. Observe: "Operation successfully completed" in the UI, but pipeline status stays `RUNNING` for 30+ minutes **For the silent cancel failure (Bug 1):** 1. Run a multi-stage pipeline 2. Click cancel immediately after one stage completes and before the next stage's tasks appear in progress 3. Observe: cancel returns 200 OK, next stage starts and runs to completion ### Anything else Affected files: | File | Issue | |---|---| | `server/services/pipeline.go:464` | `_ = CancelTask(...)` silently discards errors; unstarted tasks never marked cancelled | | `plugins/gitextractor/parser/repo_gogit.go:526` | No `ctx.Done()` check in commit loop; `gogit.Blame()` has no context | | `core/runner/run_task.go:91` | Context-cancelled tasks written as `TASK_FAILED` instead of `TASK_CANCELLED` | Suggested fixes: - Log or return errors from `CancelTask` instead of discarding them - In `CancelPipeline` for a running pipeline: immediately set all `TASK_CREATED` tasks and the pipeline itself to `TASK_CANCELLED` in the DB - In `storeRepoSnapshot`: add a `ctx.Done()` check at the top of the commit loop; investigate whether go-git exposes a context-aware blame API - In `RunTask`: use `TASK_CANCELLED` when `errors.Is(err, context.Canceled)` is true Related: #5585, #4188 ### Version b68c102f2 ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
