[
https://issues.apache.org/jira/browse/CLOUDSTACK-9385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941531#comment-15941531
]
Patrick commented on CLOUDSTACK-9385:
-------------------------------------
just upgraded from 4.5.2 to 4.9.2, xen, and also impacted by the same issue.
A few additional clarification:
Happenned at the RvR recreation to apply the new SVM template. When RvR are
rebooted to install the new SVM version, the pair always end up both in BACKUP
state, whether I do VR reboot, network clean reboot, stop / start, etc.
To fix it, had to find which of the two VR was displaying the errors: "Password
server failed with error code 1. Restarting it...", restart the password
service, restart the VR and it would then gain its MASTER state. From this
point forward, the role switch between the two VR goes smoothly, until either
VR is recreated. This is pretty ugly, I'm switching my RvR to standalone to
avoid this issue.
> Password Server is not running on RvR
> -------------------------------------
>
> Key: CLOUDSTACK-9385
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9385
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: ISO, SystemVM
> Affects Versions: 4.6.0, 4.6.1, 4.6.2, 4.7.0, 4.7.1, 4.8.0
> Reporter: dsclose
>
> NB: I have not tested this on VPC routers.
> The cloud-passwd-srvr service fails on redundant virtual routers. This
> appears to only concern redundant virtual routers. Standalone routers launch
> the password server successfully, as per this bash session:
> {code:title=Standalone Router}
> root@r-3775-VM:~# ps aux | grep passwd | grep -v grep
> root 2257 0.0 0.5 9244 1328 ? S 14:27 0:00 /bin/bash
> /opt/cloud/bin/passwd_server_ip 10.1.1.1 dummy
> root 2259 0.0 3.2 37276 8128 ? S 14:27 0:00 python
> /opt/cloud/bin/passwd_server_ip.py 10.1.1.1
> root@r-3775-VM:~# netstat -tnlp | grep 2259
> tcp 0 0 10.1.1.1:8080 0.0.0.0:* LISTEN
> 2259/python
> {code}
> However, redundant virtual routers do not exhibit this behaviour. Instead,
> the password server process is running without an IP argument. No matching
> process is bound to any ports:
> {code:title=Master Redundant Virtual Router}
> root@r-3776-VM:~# ps aux | grep passwd | grep -v grep
> root 5152 0.0 0.2 17684 1516 ? S 14:38 0:00 /bin/bash
> /opt/cloud/bin/passwd_server_ip None dummy
> root@r-3776-VM:~# netstat -ntlp | grep 5152
> root@r-3776-VM:~#
> {code}
> Further, an error message is being repeated in /var/log/messages:
> {code:title=/var/log/messages}
> May 24 14:53:07 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:11 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:14 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:17 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:20 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:23 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:26 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> May 24 14:53:29 r-3776-VM cloud: Password server failed with error code 1.
> Restarting it...
> {code}
> No process is bound to the password server port. Consequently, attempts to
> request a password from the password server get rejected.
> Manually restarting the cloud-passwd-srvr resolves this issue immediately:
> {code:title=Master Redundant Virtual Router}
> root@r-3776-VM:~# service cloud-passwd-srvr restart
> Killed password server (pid=4874)
> iptables: Bad rule (does a matching rule exist in that chain?).
> Removed cloud-passwd-srvr iptables rules
> Stopped password server (pid=5152)
> iptables: Bad rule (does a matching rule exist in that chain?).
> Removed cloud-passwd-srvr iptables rules
> Added cloud-passwd-srvr iptables rules
> root@r-3776-VM:~# nohup: appending output to `nohup.out'
> root@r-3776-VM:~# ps aux | grep passwd | grep -v grep
> root 15776 0.0 0.3 19436 1576 pts/1 S 15:05 0:00 /bin/bash
> /opt/cloud/bin/passwd_server_ip 10.1.1.250
> root 15780 0.2 1.6 45484 8304 pts/1 S 15:05 0:00 python
> /opt/cloud/bin/passwd_server_ip.py 10.1.1.250
> root 15781 0.0 0.3 19436 1572 pts/1 S 15:05 0:00 /bin/bash
> /opt/cloud/bin/passwd_server_ip 10.1.1.1
> root 15782 0.2 1.6 49692 8396 pts/1 S 15:05 0:00 python
> /opt/cloud/bin/passwd_server_ip.py 10.1.1.1
> root@r-3776-VM:~# netstat -ntlp | grep 15780
> tcp 0 0 10.1.1.250:8080 0.0.0.0:* LISTEN
> 15780/python
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)