andrijapanicsb opened a new issue #3895: Algorithm not set in agent.properties, when adding a new KVM hosts URL: https://github.com/apache/cloudstack/issues/3895 CloudStack agent, software load balancing logic ### how it works / supposed to work When having the global setting as follows (IP values are an example, to show 2 IP addresses for 2 mgmt hosts)... : - host=10.2.2.144,10.2.2.118 - indirect.agent.lb.algorithm=roundrobin (or static) - indirect.agent.lb.check.interval = xxx (different then zero) ... the behaviour we are achieving here is that the list of mgmt servers is randomised due to using "roundrobin" (or set always in the same order as it apperas in the "host" value, if we set "static) - before geting sent to all connected agents (so some agents will have "x.x.x..144,x.x.x.118@roundrobin" while some will have "x.x.x.118,x.x.x.144@roundrobin" in their agent.properties file) The presence of "@roundrobin" or "@static" in the agent.properties is needed for the background task to run - a task which says "ok, this is "@static" or "@roundrobin", i.e. NOT "@shuffle" - so I'm going to read the value of the indirect.agent.lb.check.interval and then check each indirect.agent.lb.check.interval seconds whether some of the prefered host (which died previously) is up again and will failback (reconnect) to the first host from the list of hosts in the agent.properties. If the algorith is "shuffle" or if it's missing from the agent.properties, then the task will assume "shuffle" and this means it will internally set indirect.agent.lb.check.interval=0, and thus will NEVER try to reconnect to the first host from the list of hosts in the agent.properties - so if first mshost die, agent will always connect to the next one, but when the first host is back again, the agent will NOT reconnect to the first one, since the algorith is "shuffle" or missing inside the agent.properties. ### BUG When adding a new KVM hosts, while having the global setting as descibed above, the new host will only have list of hosts, withOUT the algorith, in its agent.properties. This means if the first host is down, agent will reconnect (fail-over) to second host from the list, but when the first server is back online, it will NOT reconnect (fail-back) to the first mshost, even though the global settings (indirect.agent.lb.check.interval = xxx) says it should do so. Not a critical bug, but creates incosistency in agent "fail-back" behavior between the existing and freshly added KVM hosts. Workaround: change the host list once, change it back to what it was before ("host" setting) and everything will be properly propagated to all connected agents (including those freshly added KVM hosts)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
