DaanHoogland commented on code in PR #7977: URL: https://github.com/apache/cloudstack/pull/7977#discussion_r1350032294
########## server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java: ########## @@ -1171,6 +1178,27 @@ public ResourceCountCheckTask() { @Override protected void runInContext() { + GlobalLock lock = GlobalLock.getInternLock("ResourceCheckTask"); Review Comment: :D I think this warrants renaming the PR :p ########## server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java: ########## @@ -1171,6 +1178,27 @@ public ResourceCountCheckTask() { @Override protected void runInContext() { + GlobalLock lock = GlobalLock.getInternLock("ResourceCheckTask"); + try { + if (lock.lock(30)) { + ManagementServerHostVO msHost = managementServerHostDao.findOneByLongestRuntime(); + if (msHost == null || (msHost.getMsid() != ManagementServerNode.getManagementServerId())) { + s_logger.debug("Skipping the resource counters recalculation task on this management server"); + lock.unlock(); + return; Review Comment: does this mean the `lock.releaseRef()` is not needed? ########## framework/cluster/src/main/java/com/cloud/cluster/dao/ManagementServerHostDaoImpl.java: ########## @@ -272,4 +274,14 @@ public ManagementServerHostVO findOneInUpState(Filter filter) { return null; } + @Override + public ManagementServerHostVO findOneByLongestRuntime() { + SearchCriteria<ManagementServerHostVO> sc = StateSearch.create(); + sc.setParameters("state", ManagementServerHost.State.Up); + sc.setParameters("runid", 0); + Filter filter = new Filter(ManagementServerHostVO.class, "runid", true, 0L, 1L); Review Comment: so we order by runid and assume that means the first one is the longest running. I think this will work, but there is the issue of re-joining the cluster and time skews (this is about milliseconds. I don´t think these will be issues for this functionality but just giving a heads-up. Maybe someone else knows?? ########## server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java: ########## @@ -1171,6 +1178,27 @@ public ResourceCountCheckTask() { @Override protected void runInContext() { + GlobalLock lock = GlobalLock.getInternLock("ResourceCheckTask"); + try { + if (lock.lock(30)) { + ManagementServerHostVO msHost = managementServerHostDao.findOneByLongestRuntime(); + if (msHost == null || (msHost.getMsid() != ManagementServerNode.getManagementServerId())) { + s_logger.debug("Skipping the resource counters recalculation task on this management server"); + lock.unlock(); Review Comment: both branches of the if have the unlock() call, maybe bring it outside (as it does not seem to depend on the if. ########## server/src/main/java/com/cloud/resourcelimit/ResourceLimitManagerImpl.java: ########## @@ -1171,6 +1178,27 @@ public ResourceCountCheckTask() { @Override protected void runInContext() { + GlobalLock lock = GlobalLock.getInternLock("ResourceCheckTask"); + try { + if (lock.lock(30)) { + ManagementServerHostVO msHost = managementServerHostDao.findOneByLongestRuntime(); + if (msHost == null || (msHost.getMsid() != ManagementServerNode.getManagementServerId())) { + s_logger.debug("Skipping the resource counters recalculation task on this management server"); Review Comment: maybe make this a trace message? ```suggestion s_logger.trace("Skipping the resource counters recalculation task on this management server"); ``` ########## framework/cluster/src/main/java/com/cloud/cluster/dao/ManagementServerHostDaoImpl.java: ########## @@ -272,4 +274,14 @@ public ManagementServerHostVO findOneInUpState(Filter filter) { return null; } + @Override + public ManagementServerHostVO findOneByLongestRuntime() { + SearchCriteria<ManagementServerHostVO> sc = StateSearch.create(); + sc.setParameters("state", ManagementServerHost.State.Up); + sc.setParameters("runid", 0); + Filter filter = new Filter(ManagementServerHostVO.class, "runid", true, 0L, 1L); Review Comment: so we order by runid and assume that means the first one is the longest running. I think this will work, but there is the issue of re-joining the cluster and time skews (this is about milliseconds. I don´t think these will be issues for this functionality but just giving a heads-up. Maybe someone else knows?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org