danielemoraschi opened a new issue, #8787:
URL: https://github.com/apache/incubator-devlake/issues/8787

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/incubator-devlake/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What happened
   
   Clicking the cancel button in the UI (which fires `DELETE 
/api/pipelines/:id`) always returns
   "Operation successfully completed" (HTTP 200), but the pipeline continues 
running. Observable
   for 30+ minutes after the cancel request before it stops on its own.
   
   This is a regression of #5585 and a go-git counterpart to #4188 (which only 
fixed libgit2).
   Both were closed but the underlying causes were not fully addressed.
   
   Three independent bugs combine to produce this behaviour:
   
   ---
   
   **Bug 1: `CancelPipeline` silently discards errors from `CancelTask`**
   
   `server/services/pipeline.go`:
   
   ```go
   for _, pendingTask := range pendingTasks {
       _ = CancelTask(pendingTask.ID)  // error thrown away
   }
   ```
   
   `CancelTask` calls `runningTasks.Remove(taskId)`. If the task is not in the 
in-memory map
   (race between pipeline stages, task not yet registered, or pod restart), 
`Remove` returns
   `errors.NotFound`. The error is discarded, `cancel()` is never called, the 
goroutine keeps
   running, and the API still returns 200 OK.
   
   A second consequence: tasks in future pipeline stages (`TASK_CREATED`) are 
never in
   `runningTasks`, so `CancelTask` silently fails for all of them. They remain 
`TASK_CREATED`
   in the database and the pipeline status stays `TASK_RUNNING` until the 
goroutine naturally
   finishes.
   
   ---
   
   **Bug 2: `storeRepoSnapshot` in gitextractor ignores context cancellation 
(go-git path)**
   
   `plugins/gitextractor/parser/repo_gogit.go`:
   
   ```go
   func (r *GogitRepoCollector) storeRepoSnapshot(subtaskCtx 
plugin.SubTaskContext, commitList []*object.Commit) error {
       ctx := subtaskCtx.GetContext()
       for _, commit := range commitList {      // ← no ctx.Done() check 
between commits
           // ...
           for _, p := range patch.Stats() {
               blameResults, err := gogit.Blame(commit, fileName)  // ← no 
context parameter
   ```
   
   `gogit.Blame()` has no context parameter, it performs a full in-process 
blame computation
   and cannot be interrupted. For large repositories with thousands of commits, 
each touching
   many files, this loop runs for **30+ minutes** and is completely 
unresponsive to context
   cancellation. This is the primary cause of the long delay observed after 
pressing cancel.
   
   Issue #4188 fixed the same problem for the libgit2 implementation 
(`repo_libgit2.go`) but
   the go-git implementation was never addressed.
   
   ---
   
   **Bug 3: Cancelled tasks are marked `TASK_FAILED` instead of 
`TASK_CANCELLED`**
   
   `core/runner/run_task.go`: the deferred status update always writes 
`TASK_FAILED` when
   `err != nil`, with no special case for context cancellation:
   
   ```go
   dbe := db.UpdateColumns(task, []dal.DalSet{
       {ColumnName: "status", Value: models.TASK_FAILED},  // wrong for 
cancellations
       ...
   })
   ```
   
   The final pipeline status also becomes `TASK_FAILED` or `TASK_PARTIAL` 
rather than
   `TASK_CANCELLED`, making it impossible to distinguish a failed run from a 
cancelled one in
   the UI or dashboards.
   
   ### What do you expect to happen
   
   ### What do you expect to happen
   
   - Pressing cancel on a running pipeline stops it promptly (within seconds 
for HTTP-based plugins)
   - The pipeline and all its tasks (running and not-yet-started) are 
immediately marked `TASK_CANCELLED` in the database
   - The API returns a non-200 or a meaningful error if cancellation could not 
be applied
   - A cancelled run is distinguishable from a failed run in the UI
   
   ### How to reproduce
   
   ### How to reproduce
   
   **For the 30+ minute hang (Bug 2):**
   1. Configure a blueprint with a large git repository (thousands of commits)
   2. Trigger the pipeline and wait for `collectCommits` / blame subtask to 
begin
   3. Click cancel
   4. Observe: "Operation successfully completed" in the UI, but pipeline 
status stays `RUNNING` for 30+ minutes
   
   **For the silent cancel failure (Bug 1):**
   1. Run a multi-stage pipeline
   2. Click cancel immediately after one stage completes and before the next 
stage's tasks appear in progress
   3. Observe: cancel returns 200 OK, next stage starts and runs to completion
   
   ### Anything else
   
   Affected files:
   
   | File | Issue |
   |---|---|
   | `server/services/pipeline.go:464` | `_ = CancelTask(...)` silently 
discards errors; unstarted tasks never marked cancelled |
   | `plugins/gitextractor/parser/repo_gogit.go:526` | No `ctx.Done()` check in 
commit loop; `gogit.Blame()` has no context |
   | `core/runner/run_task.go:91` | Context-cancelled tasks written as 
`TASK_FAILED` instead of `TASK_CANCELLED` |
   
   Suggested fixes:
   - Log or return errors from `CancelTask` instead of discarding them
   - In `CancelPipeline` for a running pipeline: immediately set all 
`TASK_CREATED` tasks and the pipeline itself to `TASK_CANCELLED` in the DB
   - In `storeRepoSnapshot`: add a `ctx.Done()` check at the top of the commit 
loop; investigate whether go-git exposes a context-aware blame API
   - In `RunTask`: use `TASK_CANCELLED` when `errors.Is(err, context.Canceled)` 
is true
   
   Related: #5585, #4188
   
   ### Version
   
   b68c102f2
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to