[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-2639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13664768#comment-13664768
 ] 

Sheng Yang commented on CLOUDSTACK-2639:
----------------------------------------

It's a racy issue. Not related to RvR.

Here is the logic in reconfigLB.sh
1. Stop current haproxy instance from listening the port
2. Sleep 2 seconds
3. Try to start another haproxy instance for the new configuration
4. If failed, resume the previous haproxy instance.

But at step 2, it's possible someone else start another haproxy instance(in 
this case, the vm is still in booting period and haproxy started by booting 
script), then step 3 would failed, finally we would got two haproxy process 
running, and no more LB configuration would be allowed because all the 
sub-sequence instance would fail to due to "cannot bind to socket"(because only 
one of haproxy process is hung up, the other one would cause conflict).

Here is the log from troublesome router:

 May 22 07:49:52 r-196-VM cloud: Loadbalancer public interfaces =  eth2 eth3 
eth4           <---- start reconfigLB.sh
May 22 07:49:54 r-196-VM Keepalived_vrrp: Registering Kernel netlink reflector
May 22 07:49:54 r-196-VM Keepalived_vrrp: Registering Kernel netlink command 
channel
May 22 07:49:54 r-196-VM Keepalived_vrrp: Registering gratutious ARP shared 
channel
May 22 07:49:54 r-196-VM Keepalived_vrrp: IPVS: Can't initialize ipvs: Protocol 
not available
May 22 07:49:54 r-196-VM Keepalived_vrrp: Opening file 
'/etc/keepalived/keepalived.conf'.
May 22 07:49:54 r-196-VM Keepalived_vrrp: Configuration is using : 38859 Bytes
May 22 07:49:54 r-196-VM Keepalived_vrrp: Using LinkWatch kernel netlink 
reflector...
May 22 07:49:54 r-196-VM Keepalived_vrrp: VRRP_Instance(inside_network) 
Entering BACKUP STATE
May 22 07:49:54 r-196-VM Keepalived_vrrp: VRRP_Script(heartbeat) succeeded
May 22 07:49:54 r-196-VM cloud: Starting postinit
May 22 07:49:54 r-196-VM cloud: Starting cloud-passwd-srvr
May 22 07:49:54 r-196-VM cloud: Starting ssh
May 22 07:49:54 r-196-VM cloud: Starting dnsmasq
May 22 07:49:54 r-196-VM cloud: Starting haproxy     <-- haproxy started by 
booting process
May 22 07:49:54 r-196-VM cloud: Starting apache2
May 22 07:49:54 r-196-VM cloud: Stopping cloud
May 22 07:49:54 r-196-VM cloud: Stopping nfs-common
May 22 07:49:54 r-196-VM cloud: Stopping portmap
May 22 07:49:54 r-196-VM cloud: Stopping cloud
May 22 07:49:54 r-196-VM cloud: Stopping nfs-common
May 22 07:49:54 r-196-VM cloud: Stopping portmap
May 22 07:49:54 r-196-VM cloud: Stopping cloud
May 22 07:49:54 r-196-VM cloud: Stopping nfs-common
May 22 07:49:54 r-196-VM cloud: Stopping portmap
May 22 07:49:54 r-196-VM cloud: New instance failed to start, resuming previous 
one.    <---- reconfigLB.sh failed
May 22 07:49:54 r-196-VM cloud: Reconfiguring loadbalancer failed

And this bug has been there since the beginning.

In this case, the mgmt server commands are executed even before VM complete the 
booting process, then result in this trouble.

                
> Occasional error by restart network or virtual router
> -----------------------------------------------------
>
>                 Key: CLOUDSTACK-2639
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2639
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: 4.0.0
>            Reporter: Sheng Yang
>            Assignee: Sheng Yang
>             Fix For: 4.2.0
>
>
> Occasional error by restart network or virtual router
> Sometime we faces occasional error by restart network or virtual router.
> In case of restart network by API 200 times, we found 3 errors then, 
> in case of restart virtual router 200 times, we found 5 errors then.
> The errors are found in management server log, job-130187 for instance.
> ------------------------------------
> failed due to LoadBalancerConfigCommand on domain router 10.129.51.128 failed.
> message: [WARNING] 348/024433 (3249) : config : 'option forwardfor' ignored 
> for
> proxy '210_129_192_105-8080' as it requires HTTP mode.
> ------------------------------------
> VR log's error is below:
> ------------------------------------
> Dec 14 02:44:41 r-17544-VM cloud: Loadbalancer public interfaces = eth2 eth3
> eth4
> Dec 14 02:44:43 r-17544-VM cloud: New instance failed to start, resuming
> previous one.
> Dec 14 02:44:43 r-17544-VM cloud: Reconfiguring loadbalancer failed
> ------------------------------------

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to