Status: New
Owner: ----
New issue 899 by [email protected]: gnt-cluster verify ignoring one
vm, which disks were degraded
http://code.google.com/p/ganeti/issues/detail?id=899
What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".
What distribution are you using?
# gnt-cluster --version
gnt-cluster (ganeti v2.11.3) 2.11.3
# gnt-cluster version
Software version: 2.11.3
Internode protocol: 2110000
Configuration format: 2110000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.11.3
# hspace --version
hspace (ganeti) version v2.11.3
compiled with ghc 7.4
running on linux x86_64
# cat /etc/debian_version
7.6
# apt-cache policy ganeti
ganeti:
Installed: 2.11.3-2~bpo70+1
Candidate: 2.11.3-2~bpo70+1
Package pin: 2.11.3-2~bpo70+1
Version table:
*** 2.11.3-2~bpo70+1 990
100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64
Packages
100 /var/lib/dpkg/status
2.10.5-1~bpo70+1 990
100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64
Packages
2.9.5-1~bpo70+1 990
100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64
Packages
What steps will reproduce the problem?
1. take secondary network down for a Moment
2. check gnt-cluster verify (it had complained about the other VMs, but
missing that one)
3. wait until instance disks are in sync (drbd23 and drbd7 still stayed
degraded)
4. check gnt-cluster verify
What is the expected output? What do you see instead?
verify didn't complain about the disks being degraded
root@node1 ~ # gnt-cluster verify
Submitted jobs 154434, 154435
Waiting for job 154434 ...
Sat Jul 26 14:19:49 2014 * Verifying cluster config
Sat Jul 26 14:19:49 2014 * Verifying cluster certificate files
Sat Jul 26 14:19:49 2014 * Verifying hypervisor parameters
Sat Jul 26 14:19:49 2014 * Verifying all nodes belong to an existing group
Waiting for job 154435 ...
Sat Jul 26 14:19:50 2014 * Verifying group 'default'
Sat Jul 26 14:19:50 2014 * Gathering data (2 nodes)
Sat Jul 26 14:19:51 2014 * Gathering disk information (2 nodes)
Sat Jul 26 14:19:56 2014 * Verifying configuration file consistency
Sat Jul 26 14:19:56 2014 * Verifying node status
Sat Jul 26 14:19:56 2014 * Verifying instance status
+ Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de:
disk/0 on node1.yyyyyy.xxxxxxxxx.de is degraded
+ Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de:
disk/1 on node1.yyyyyy.xxxxxxxxx.de is degraded
+ Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de:
disk/0 on node2.yyyyyy.xxxxxxxxx.de is degraded
+ Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de:
disk/1 on node2.yyyyyy.xxxxxxxxx.de is degraded
Sat Jul 26 14:19:56 2014 * Verifying orphan volumes
Sat Jul 26 14:19:56 2014 * Verifying N+1 Memory redundancy
Sat Jul 26 14:19:56 2014 * Other Notes
Sat Jul 26 14:19:56 2014 * Hooks Results
Please provide any additional information below.
The first Job "154414 error INSTANCE_ACTIVATE_DISKS[...]" failed
The second Job "154422 success INSTANCE_ACTIVATE_DISKS[...]" was
successful, but didn't any longer contain the VM
After reactivating the disks manually (gnt-instance activate-disks zzzzzz)
everything went back to normal
root@node1 ~ # gnt-instance info zzzzzz
[...]
- disk/0: drbd, size 32.0G
access mode: rw
nodeA: node2.yyyyyy.xxxxxxxxx.de, minor=5
nodeB: node1.yyyyyy.xxxxxxxxx.de, minor=7
port: 11021
auth key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
on primary: /dev/drbd5 (147:5) in sync, status *DEGRADED*
on secondary: /dev/drbd7 (147:7) in sync, status *DEGRADED*
[...]
- disk/1: drbd, size 128.0G
access mode: rw
nodeA: node2.yyyyyy.xxxxxxxxx.de, minor=23
nodeB: node1.yyyyyy.xxxxxxxxx.de, minor=23
port: 11045
auth key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
on primary: /dev/drbd23 (147:23) in sync, status *DEGRADED*
on secondary: /dev/drbd23 (147:23) in sync, status *DEGRADED*
[...]
root@node1 ~ # cat /proc/drbd
[...]
7: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:0 dw:1937100 dr:0 al:0 bm:29 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[...]
23: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings