siddhantsangwan commented on PR #4415: URL: https://github.com/apache/ozone/pull/4415#issuecomment-1475724957
DU is run in a DN every 60 minutes by default (`hdds.datanode.du.refresh.period`). That means SCM gets updated information on a DN's free space every 60 minutes. This information is used by Container Balancer to compare DNs based on their space utilisation (free space divided by total space). Container Balancer is stateless between iterations - all the space utilisation information is recalculated every iteration. So, it's good to have a default Container Balancer iteration interval that's greater than the DU interval. It prevents balancer from making moves with the same stale information that was used in the previous iteration. If we increase `moveTimeout` to 90 minutes, then at the latest we expect moves to complete close to the 90th minute. In the worst case, DU will run before moves have completed. This means if our Container Balancer iteration interval is close to (and greater than) 90 minutes, it'll start a new iteration that does not account for moves made by the previous iteration because DU hasn't calculated the latest space yet. That's why I think the iteration interval should be greater than 90 + 60 minutes. 160 minutes seems like a good default to me. If anyone wants to make it more aggressive, they can enable `trigger.du.before.move.enable` and reduce the iteration interval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
