xBis7 commented on code in PR #4819:
URL: https://github.com/apache/ozone/pull/4819#discussion_r1223080546
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -389,9 +399,19 @@ public SnapshotDiffResponse getSnapshotDiffReport(
String snapDiffJobKey = fsInfo.getSnapshotID() + DELIMITER +
tsInfo.getSnapshotID();
+ // Check if the job already exists in the table
+ // before getting report status.
+ if (cancel && Objects.nonNull(snapDiffJobTable.get(snapDiffJobKey)) &&
Review Comment:
> Also snapDiff job submission and cancellation should be consider two
separate options in OmSnapshotManager and SnapshotDiffManager.
>
> In
[OzoneManager](https://github.com/apache/ozone/pull/4819/files#diff-54bbfc5b5dd89a4d035bee206c53d3d11bbb579a2957abca589e313283d98af9R4523),
you can use cancel to decide which flow needs to be called.
The OzoneManager changes will certainly make everything look cleaner but
won't change anything.
> 1. If job doesn't exist, we would create a new job. Which is not needed.
Correct behavior should be return empty response with a message.
Are you saying that if the job doesn't exist then cancel should just return
and not proceed with submitting the job? Maybe return CANCELED response? Why
submit a new job with a cancel parameter? Or should we check for the case that
the job is stuck as QUEUED?
> 2. If job is not IN_PROGRESS status and it is DONE, it will not mark the
job cancel and will return first page response. I am not sure if that is
correct response to client.
I thought that's the intention here. The user wants to cancel a job only if
it's in progress. If the job is DONE, then why cancel it? The system has
already gone through the whole process and finished with it.
> 3. It will fail at line 409, if two or more commands are issued at the
same time for the same snapDiff job because one of them will mark it cancel
successfully and other/s will fail due to
[check](https://github.com/apache/ozone/blob/9034434285f3877246b3ebc6a7513c082931b279/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java#L1164).
If the job is not IN_PROGRESS then it shouldn't be marked as canceled. I
don't think the exception affects us in any case.
We try to update the job status and
* <b>cancel succeeds</b>, job status is updated to CANCELED.
Cancel check should pick it up but in any case where somehow we end up
updating status to DONE or FAILED or REJECTED, we get an exception, status is
not updated and the job is left in the table as CANCELED. In that case The
cleanup service picks it up and deletes everything.
* <b>cancel fails</b>, job status is not updated to CANCELED.
Job status is either FAILED, REJECTED or DONE. If it's DONE then great it
will return the diff report to the user. If it's one of the other options, the
job will be cleaned up by the cleanup service.
The way I see the flow, we run `getSnapshotDiffReport()` with cancel=true.
If the job is
* NEW, cancel is ignored, job is registered as QUEUED and then submitted
* QUEUED, cancel is ignored, gets submitted again
* IN_PROGRESS, cancel takes effect, job is marked as canceled
* DONE, cancel is ignored, get the diff report
* FAILED - REJECTED - CANCELED, cancel is ignored, get the response until
the cleanup service deletes the job.
This all seems intended behavior to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]