[
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958812#comment-15958812
]
ASF GitHub Bot commented on CLOUDSTACK-9864:
--------------------------------------------
Github user abhinandanprateek commented on a diff in the pull request:
https://github.com/apache/cloudstack/pull/2030#discussion_r110145877
--- Diff:
plugins/hypervisors/vmware/src/com/cloud/hypervisor/vmware/manager/VmwareManagerImpl.java
---
@@ -550,15 +552,21 @@ public boolean needRecycle(String workerTag) {
return true;
}
- // disable time-out check until we have found out a VMware API
that can check if
- // there are pending tasks on the subject VM
- /*
- if(System.currentTimeMillis() - startTick >
_hungWorkerTimeout) {
- if(s_logger.isInfoEnabled())
- s_logger.info("Worker VM expired, seconds elapsed:
" + (System.currentTimeMillis() - startTick) / 1000);
- return true;
- }
- */
+ // this time-out check was disabled
+ // "until we have found out a VMware API that can check if there
are pending tasks on the subject VM"
+ // but as we expire jobs and those stale worker VMs stay around
untill an MS reboot we opt in to have them removed anyway
+ Long hungWorkerTimeout = 2 *
(AsyncJobManagerImpl.JobExpireMinutes.value() +
AsyncJobManagerImpl.JobCancelThresholdMinutes.value()) * MILISECONDS_PER_MINUTE;
+ Long letsSayNow = System.currentTimeMillis();
+ if(s_vmwareCleanOldWorderVMs.value() && letsSayNow - startTick >
hungWorkerTimeout) {
+ if(s_logger.isInfoEnabled()) {
+ s_logger.info("Worker VM expired, seconds elapsed: " +
(System.currentTimeMillis() - startTick) / 1000);
+ }
--- End diff --
For timeouts you may want to use java Duration, that is much cleaner.
> cleanup stale worker VMs after job expiry time
> ----------------------------------------------
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
> Issue Type: Improvement
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: VMware
> Reporter: Daan Hoogland
> Assignee: Daan Hoogland
> Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the
> documented reason that there is no API to query for related tasks in vcenter.
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can
> be removed.
> As some administrators may not want this behaviour there will be a setting
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time)
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)