qwtsc commented on PR #290:
URL:
https://github.com/apache/incubator-hugegraph-computer/pull/290#issuecomment-1827323729
@javeme @imbajin I finally find the actual reason.
* the default delete policy in k8s has changed to `background` since v1.20.
* Plus the way you added finalizer is `replaceCR`, so you will sometimes
miss the delete event generated by your client.
```java
private boolean finalizer(HugeGraphComputerJob computerJob) {
if (computerJob.addFinalizer(FINALIZER_NAME)) {
---> // this is partial reason, can u use patch CR instead?
this.replaceCR(computerJob);
return true;
}
ComputerJobStatus status = computerJob.getStatus();
if (computerJob.isMarkedForDeletion()) {
if (!JobStatus.finished(status.getJobStatus())) {
status.setJobStatus(JobStatus.CANCELLED.name());
this.updateStatus(computerJob);
} else {
if (computerJob.removeFinalizer(FINALIZER_NAME)) {
this.replaceCR(computerJob);
}
}
return true;
} else {
if (JobStatus.finished(status.getJobStatus())) {
if (this.autoDestroyPod) {
this.deleteCR(computerJob);
}
return true;
}
}
return false;
}
```
* k8s will deleted its owned resource in the background whether the CR is
deleted or not, so the calculated job status will always remain INITIALIZING,
then the operator won't deploy again.
```java
ComputerJobComponent observed = this.observeComponent(computerJob);
----> //calculated status remains INITIALIZING forever, so won't deploy it
again.
if (!this.updateStatus(observed) && request.retryTimes() == 0) {
LOG.debug("Wait status to be stable before taking further
actions");
return OperatorResult.NO_REQUEUE;
}
if (Objects.equals(computerJob.getStatus().getJobStatus(),
JobStatus.RUNNING.name())) {
String crName = computerJob.getMetadata().getName();
LOG.info("ComputerJob {} already running, no action", crName);
return OperatorResult.NO_REQUEUE;
}
ComputerJobDeployer deployer = new
ComputerJobDeployer(this.kubeClient,
this.config);
deployer.deploy(observed);
return OperatorResult.NO_REQUEUE;
```
* Then as human, we observe the unit-test case failed again and again.
### solution
My solution is to adjust the delete policy back to foreground. Everyone is
happy now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]