lhotari commented on a change in pull request #9393:
URL: https://github.com/apache/pulsar/pull/9393#discussion_r572656526



##########
File path: 
pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/impl/LinuxBrokerHostUsageImpl.java
##########
@@ -75,18 +75,17 @@ public LinuxBrokerHostUsageImpl(int 
hostUsageCheckIntervalMin,
         this.lastCollection = 0L;
         this.usage = new SystemResourceUsage();
         this.overrideBrokerNicSpeedGbps = overrideBrokerNicSpeedGbps;
-        executorService.scheduleAtFixedRate(this::calculateBrokerHostUsage, 0,
-                hostUsageCheckIntervalMin, TimeUnit.MINUTES);
 
         boolean isCGroupsEnabled = false;
         try {
              isCGroupsEnabled = 
Files.exists(Paths.get(CGROUPS_CPU_USAGE_PATH));
         } catch (Exception e) {
             log.warn("Failed to check cgroup CPU usage file: {}", 
e.getMessage());
         }
-
         this.isCGroupsEnabled = isCGroupsEnabled;
-        calculateBrokerHostUsage();

Review comment:
       I don't think that the call to `calculateBrokerHostUsage` should be 
removed. It should remain so that the values are available after the 
constructor returns. Instead of removing this line, the scheduling should 
happen after this call and the initial delay of the scheduling should be set to 
`hostUsageCheckIntervalMin` to prevent the race. 
   
   The problem in the race should also be fixed, I have created a few commits 
in 
https://github.com/apache/pulsar/compare/master...lhotari:issue-6368-flaky-test 
to fix the issue that the concurrent calls to `calculateBrokerHostUsage` caused.
   
   Some suggested improvements to de-risk actual production flakiness:
   * [Make ResourceUsage class 
immutable](https://github.com/lhotari/pulsar/commit/7c5ea55fffd49a2b09b53ca0b1132eef0b1e67cf)
   * [Replace usage of Files.readAllBytes with 
Files.copy](https://github.com/lhotari/pulsar/commit/97a3fb609b6aea81327146d2296412c5599988b2)
   * [Pre-calculate the usage percentage in ResourceUsage 
class](https://github.com/lhotari/pulsar/commit/32aec9de38674f959d090305cb40b14f8c994df7)
   * [Fix initial cpu usage calculation and use System.nanoTime() for 
calculating elapsed 
time](https://github.com/apache/pulsar/commit/468550f471e3e1b4e5071fa9e5fac2f69713a398)
   
   One possibility is to cherry-pick these commits to this same PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to