xBis7 commented on code in PR #4819:
URL: https://github.com/apache/ozone/pull/4819#discussion_r1223080546


##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -389,9 +399,19 @@ public SnapshotDiffResponse getSnapshotDiffReport(
     String snapDiffJobKey = fsInfo.getSnapshotID() + DELIMITER +
         tsInfo.getSnapshotID();
 
+    // Check if the job already exists in the table
+    // before getting report status.
+    if (cancel && Objects.nonNull(snapDiffJobTable.get(snapDiffJobKey)) &&

Review Comment:
   > Also snapDiff job submission and cancellation should be consider two 
separate options in OmSnapshotManager and SnapshotDiffManager.
   > 
   > In 
[OzoneManager](https://github.com/apache/ozone/pull/4819/files#diff-54bbfc5b5dd89a4d035bee206c53d3d11bbb579a2957abca589e313283d98af9R4523),
 you can use cancel to decide which flow needs to be called.
   
   The OzoneManager changes will certainly make everything look cleaner but 
won't change anything.
   
   > 1. If job doesn't exist, we would create a new job. Which is not needed. 
Correct behavior should be return empty response with a message.
   
   Are you saying that if the job doesn't exist then cancel should just return 
and not proceed with submitting the job? Maybe return CANCELED response? Why 
submit a new job with a cancel parameter? Or should we check for the case that 
the job is stuck as QUEUED?
   
   > 2. If job is not IN_PROGRESS status and it is DONE, it will not mark the 
job cancel and will return first page response. I am not sure if that is 
correct response to client.
   
   I thought that's the intention here. The user wants to cancel a job only if 
it's in progress. If the job is DONE, then why cancel it? The system has 
already gone through the whole process and finished with it. 
   
   > 3. It will fail at line 409, if two or more commands are issued at the 
same time for the same snapDiff job because one of them will mark it cancel 
successfully and other/s will fail due to 
[check](https://github.com/apache/ozone/blob/9034434285f3877246b3ebc6a7513c082931b279/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java#L1164).
   
   If the job is not IN_PROGRESS then it shouldn't be marked as canceled. I 
don't think the exception affects us in any case. 
   
   We try to update the job status and
   * <b>cancel succeeds</b>, job status is updated to CANCELED. 
   Cancel check should pick it up but in any case where somehow we end up 
updating status to DONE or FAILED or REJECTED, we get an exception, status is 
not updated and the job is left in the table as CANCELED. In that case The 
cleanup service picks it up and deletes everything.
   * <b>cancel fails</b>, job status is not updated to CANCELED. 
   Job status is either FAILED, REJECTED or DONE. If it's DONE then great it 
will return the diff report to the user. If it's one of the other options, the 
job will be cleaned up by the cleanup service.
   
   The way I see the flow, we run `getSnapshotDiffReport()` with cancel=true. 
   If the job is
   * NEW, cancel is ignored, job is registered as QUEUED and then submitted
   * QUEUED, cancel is ignored, gets submitted again
   * IN_PROGRESS, cancel takes effect, job is marked as canceled
   * DONE, cancel is ignored, get the diff report 
   * FAILED - REJECTED - CANCELED, cancel is ignored, get the response until 
the cleanup service deletes the job.
   
   This all seems intended behavior to me.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to