xBis7 commented on code in PR #4819:
URL: https://github.com/apache/ozone/pull/4819#discussion_r1223080546
##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -389,9 +399,19 @@ public SnapshotDiffResponse getSnapshotDiffReport(
String snapDiffJobKey = fsInfo.getSnapshotID() + DELIMITER +
tsInfo.getSnapshotID();
+ // Check if the job already exists in the table
+ // before getting report status.
+ if (cancel && Objects.nonNull(snapDiffJobTable.get(snapDiffJobKey)) &&
Review Comment:
> Also snapDiff job submission and cancellation should be consider two
separate options in OmSnapshotManager and SnapshotDiffManager.
>
> In
[OzoneManager](https://github.com/apache/ozone/pull/4819/files#diff-54bbfc5b5dd89a4d035bee206c53d3d11bbb579a2957abca589e313283d98af9R4523),
you can use cancel to decide which flow needs to be called.
The OzoneManager changes will certainly make everything look cleaner but
won't change anything.
> 1. If job doesn't exist, we would create a new job. Which is not needed.
Correct behavior should be return empty response with a message.
Are you saying that if the job doesn't exist then cancel should just return
and not proceed with submitting the job? Maybe return `CANCELED` response? Why
submit a new job with a cancel parameter? Or should we check for the case that
the job is stuck as `QUEUED`?
> 2. If job is not IN_PROGRESS status and it is DONE, it will not mark the
job cancel and will return first page response. I am not sure if that is
correct response to client.
I thought that's the intention here. The user wants to cancel a job only if
it's in progress. If the job is `DONE`, then why cancel it? The system has
already gone through the whole process and finished with it.
> 3. It will fail at line 409, if two or more commands are issued at the
same time for the same snapDiff job because one of them will mark it cancel
successfully and other/s will fail due to
[check](https://github.com/apache/ozone/blob/9034434285f3877246b3ebc6a7513c082931b279/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java#L1164).
If the job is not `IN_PROGRESS` then it shouldn't be marked as canceled. I
don't think the exception affects us in any case.
We try to update the job status and
* <b>cancel succeeds</b>, job status is updated to `CANCELED`.
Cancel check should pick it up but in any case where somehow we end up
updating status to `DONE` or `FAILED` or `REJECTED`, we get an exception,
status is not updated and the job is left in the table as `CANCELED`. In that
case The cleanup service picks it up and deletes everything.
* <b>cancel fails</b>, job status is not updated to `CANCELED`.
Job status is either `FAILED`, `REJECTED` or `DONE`. If it's `DONE` then
great it will return the diff report to the user. If it's one of the other
options, the job will be cleaned up by the cleanup service.
The way I see the flow when we run `getSnapshotDiffReport()` with
cancel=true,
If the job is
* `NEW`, cancel is ignored, job is registered as `QUEUED` and then submitted
* `QUEUED`, cancel is ignored, job gets submitted again
* `IN_PROGRESS`, cancel takes effect, job is marked as canceled
* `DONE`, cancel is ignored, get the diff report
* `FAILED` - `REJECTED` - `CANCELED`, cancel is ignored, get the response
until the cleanup service deletes the job.
This all seems intended behavior to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]