michaeljmarshall commented on pull request #9393: URL: https://github.com/apache/pulsar/pull/9393#issuecomment-771394148
@lhotari - thank you for your detailed explanation. I did not consider running this on a special environment to simulate the testing env. That is a great point, and something I'll keep in mind in the future. Based on the example logging output you provided, it looks like it is still the broker being overridden because it is considered overloaded. See the following: ``` 07:04:53.511 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] INFO org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - 2 brokers being considered for assignment of tenant-c8478edb-886e-43b7-8d43-12cc159c0eb9/use/ns1/0x00000000_0xffffffff 07:04:53.511 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker http://localhost:33199 is overloaded: max usage=1.2440309524536133 07:04:53.511 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker localhost:33199 is overloaded: CPU: 124.40309%, MEMORY: 19.42325%, DIRECT MEMORY: 2.4414062%, BANDWIDTH IN: 0.0%, BANDWIDTH OUT: 0.0% 07:04:53.516 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] INFO org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - 1 brokers being considered for assignment of tenant-c8478edb-886e-43b7-8d43-12cc159c0eb9/use/ns2/0x00000000_0xffffffff 07:04:53.516 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker http://localhost:33199 is overloaded: max usage=1.2440309524536133 07:04:53.516 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker localhost:33199 is overloaded: CPU: 124.40309%, MEMORY: 19.42325%, DIRECT MEMORY: 2.4414062%, BANDWIDTH IN: 0.0%, BANDWIDTH OUT: 0.0% 07:04:53.517 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker http://localhost:33199 is overloaded: max usage=1.2440309524536133 07:04:53.517 [TestNG-method=testBrokerSelectionForAntiAffinityGroup-1] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker localhost:33199 is overloaded: CPU: 124.40309%, MEMORY: 19.42325%, DIRECT MEMORY: 2.4414062%, BANDWIDTH IN: 0.0%, BANDWIDTH OUT: 0.0% ``` Given the first log line, it does look like two brokers are considered for placement. However, the CPU is listed at 124%, which leads to an override. I mentioned a concern about this in my initial PR message: > Note that I am assuming the following method will never return a value greater than 1, which could lead to test failure. Perhaps it is worth bumping the limit to something like 300 in this one case? If we only have 2 cores, we won't exceed that. Although, that does leave us with the potential to see this flakiness again if we ever give the test more cores. Do you think the broker's cpu utilization is high because they are still in the process of starting up? If so, perhaps your suggested `await` command could help by giving the brokers time to stabilize. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
