[ https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958812#comment-15958812 ]
ASF GitHub Bot commented on CLOUDSTACK-9864: -------------------------------------------- Github user abhinandanprateek commented on a diff in the pull request: https://github.com/apache/cloudstack/pull/2030#discussion_r110145877 --- Diff: plugins/hypervisors/vmware/src/com/cloud/hypervisor/vmware/manager/VmwareManagerImpl.java --- @@ -550,15 +552,21 @@ public boolean needRecycle(String workerTag) { return true; } - // disable time-out check until we have found out a VMware API that can check if - // there are pending tasks on the subject VM - /* - if(System.currentTimeMillis() - startTick > _hungWorkerTimeout) { - if(s_logger.isInfoEnabled()) - s_logger.info("Worker VM expired, seconds elapsed: " + (System.currentTimeMillis() - startTick) / 1000); - return true; - } - */ + // this time-out check was disabled + // "until we have found out a VMware API that can check if there are pending tasks on the subject VM" + // but as we expire jobs and those stale worker VMs stay around untill an MS reboot we opt in to have them removed anyway + Long hungWorkerTimeout = 2 * (AsyncJobManagerImpl.JobExpireMinutes.value() + AsyncJobManagerImpl.JobCancelThresholdMinutes.value()) * MILISECONDS_PER_MINUTE; + Long letsSayNow = System.currentTimeMillis(); + if(s_vmwareCleanOldWorderVMs.value() && letsSayNow - startTick > hungWorkerTimeout) { + if(s_logger.isInfoEnabled()) { + s_logger.info("Worker VM expired, seconds elapsed: " + (System.currentTimeMillis() - startTick) / 1000); + } --- End diff -- For timeouts you may want to use java Duration, that is much cleaner. > cleanup stale worker VMs after job expiry time > ---------------------------------------------- > > Key: CLOUDSTACK-9864 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864 > Project: CloudStack > Issue Type: Improvement > Security Level: Public(Anyone can view this level - this is the > default.) > Components: VMware > Reporter: Daan Hoogland > Assignee: Daan Hoogland > Labels: vmware, vsphere, workers > > In the present code cleaning worker vms after a timeout is disabled, with the > documented reason that there is no API to query for related tasks in vcenter. > ACS has an expiry time for jobs and a cancel time for jobs. > - Jobs that take longer then the expiry time will have their results be be > neglected. > - Jobs that are cancelled are forcibly removed after the cancellation expity > time. > Any worker remaining after expiry+cancellation will surely be stale and can > be removed. > As some administrators may not want this behaviour there will be a setting > which by default is false that will guard against cleaning stale worker VMs. > Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) > as a safe margin. > related settings: > job.expire.minutes: 1440 > job.cancel.threshold.minutes: 60 > vmware.clean.old.worker.vms: false (new) -- This message was sent by Atlassian JIRA (v6.3.15#6346)