imbajin commented on code in PR #2937:
URL: https://github.com/apache/hugegraph/pull/2937#discussion_r3323507986


##########
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/task/DistributedTaskScheduler.java:
##########
@@ -316,14 +355,18 @@ protected <V> HugeTask<V> deleteFromDB(Id id) {
 
     @Override
     public <V> HugeTask<V> delete(Id id, boolean force) {
-        if (!force) {
-            // Change status to DELETING, perform the deletion operation 
through automatic
-            // scheduling.
+        HugeTask<?> task = this.taskWithoutResult(id);
+
+        if (!force && !task.completed()) {
+            // Check task status: can't delete running tasks without force
             this.updateStatus(id, null, TaskStatus.DELETING);

Review Comment:
   Deleting a running task can race with the task's own cancellation callbacks 
and resurrect the task record. In this branch the scheduler cancels the 
in-memory task and immediately removes the task vertex from DB, but 
`HugeTask.cancel()` invokes `TaskCallable.cancelled()`, and both 
`UserJob.cancelled()` / `SysJob.cancelled()` call `save()`. Since `taskDone()` 
is now a no-op, a long-running job deleted via `delete(id, false)` can be 
removed here and then written back as `CANCELLED` during cancellation cleanup. 
Please make deletion win over later task saves, or defer physical deletion 
until the runner has fully stopped and can no longer persist the task.



##########
hugegraph-server/hugegraph-test/src/main/java/org/apache/hugegraph/core/TaskCoreTest.java:
##########
@@ -76,20 +77,22 @@ public void testTask() throws TimeoutException {
         Assert.assertEquals(id, task.id());
         Assert.assertFalse(task.completed());
 
-        Assert.assertThrows(IllegalArgumentException.class, () -> {
-            scheduler.delete(id, false);
-        }, e -> {
-            Assert.assertContains("Can't delete incomplete task '88888'",
-                                  e.getMessage());
-        });
+        if (scheduler.getClass().equals(StandardTaskScheduler.class)) {

Review Comment:
   This PR changes the high-risk `DistributedTaskScheduler` paths for 
`cancel()`, `delete()`, `close()`, and restore behavior, but the updated tests 
mostly skip distributed-specific assertions or only relax status expectations. 
I could not find a test that actually exercises the new distributed 
`delete(false)` / running-task cancellation paths, which is where the 
DB-delete-versus-callback-save race above lives. Please add distributed 
scheduler coverage for at least running task + `delete(false)`, running task + 
`cancel()`, and close/cron shutdown behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to