Status: New
Owner: ----

New issue 899 by [email protected]: gnt-cluster verify ignoring one vm, which disks were degraded
http://code.google.com/p/ganeti/issues/detail?id=899

What software version are you running? Please provide the output of "gnt-
cluster --version", "gnt-cluster version", and "hspace --version".

What distribution are you using?
# gnt-cluster --version
gnt-cluster (ganeti v2.11.3) 2.11.3
# gnt-cluster version
Software version: 2.11.3
Internode protocol: 2110000
Configuration format: 2110000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.11.3
# hspace --version
hspace (ganeti) version v2.11.3
compiled with ghc 7.4
running on linux x86_64
# cat /etc/debian_version
7.6
# apt-cache policy ganeti
ganeti:
  Installed: 2.11.3-2~bpo70+1
  Candidate: 2.11.3-2~bpo70+1
  Package pin: 2.11.3-2~bpo70+1
  Version table:
 *** 2.11.3-2~bpo70+1 990
100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages
        100 /var/lib/dpkg/status
     2.10.5-1~bpo70+1 990
100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages
     2.9.5-1~bpo70+1 990
100 http://debian.xxxxxxxx.de/debian/ wheezy-backports/main amd64 Packages


What steps will reproduce the problem?
1. take secondary network down for a Moment
2. check gnt-cluster verify (it had complained about the other VMs, but missing that one) 3. wait until instance disks are in sync (drbd23 and drbd7 still stayed degraded)
4. check gnt-cluster verify


What is the expected output? What do you see instead?
verify didn't complain about the disks being degraded

  root@node1 ~ # gnt-cluster verify
  Submitted jobs 154434, 154435
  Waiting for job 154434 ...
  Sat Jul 26 14:19:49 2014 * Verifying cluster config
  Sat Jul 26 14:19:49 2014 * Verifying cluster certificate files
  Sat Jul 26 14:19:49 2014 * Verifying hypervisor parameters
  Sat Jul 26 14:19:49 2014 * Verifying all nodes belong to an existing group
  Waiting for job 154435 ...
  Sat Jul 26 14:19:50 2014 * Verifying group 'default'
  Sat Jul 26 14:19:50 2014 * Gathering data (2 nodes)
  Sat Jul 26 14:19:51 2014 * Gathering disk information (2 nodes)
  Sat Jul 26 14:19:56 2014 * Verifying configuration file consistency
  Sat Jul 26 14:19:56 2014 * Verifying node status
  Sat Jul 26 14:19:56 2014 * Verifying instance status
+ Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de: disk/0 on node1.yyyyyy.xxxxxxxxx.de is degraded + Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de: disk/1 on node1.yyyyyy.xxxxxxxxx.de is degraded + Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de: disk/0 on node2.yyyyyy.xxxxxxxxx.de is degraded + Sat Jul 26 14:08:00 2014 - ERROR: instance zzzzzz.yyyyyy.xxxxxxxxx.de: disk/1 on node2.yyyyyy.xxxxxxxxx.de is degraded
  Sat Jul 26 14:19:56 2014 * Verifying orphan volumes
  Sat Jul 26 14:19:56 2014 * Verifying N+1 Memory redundancy
  Sat Jul 26 14:19:56 2014 * Other Notes
  Sat Jul 26 14:19:56 2014 * Hooks Results


Please provide any additional information below.
The first  Job "154414 error   INSTANCE_ACTIVATE_DISKS[...]" failed
The second Job "154422 success INSTANCE_ACTIVATE_DISKS[...]" was successful, but didn't any longer contain the VM

After reactivating the disks manually (gnt-instance activate-disks zzzzzz) everything went back to normal

root@node1 ~ # gnt-instance info zzzzzz
[...]
    - disk/0: drbd, size 32.0G
      access mode: rw
      nodeA: node2.yyyyyy.xxxxxxxxx.de, minor=5
      nodeB: node1.yyyyyy.xxxxxxxxx.de, minor=7
      port: 11021
      auth key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      on primary: /dev/drbd5 (147:5) in sync, status *DEGRADED*
      on secondary: /dev/drbd7 (147:7) in sync, status *DEGRADED*
[...]
    - disk/1: drbd, size 128.0G
      access mode: rw
      nodeA: node2.yyyyyy.xxxxxxxxx.de, minor=23
      nodeB: node1.yyyyyy.xxxxxxxxx.de, minor=23
      port: 11045
      auth key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
      on primary: /dev/drbd23 (147:23) in sync, status *DEGRADED*
      on secondary: /dev/drbd23 (147:23) in sync, status *DEGRADED*
[...]

root@node1 ~ # cat /proc/drbd
[...]
 7: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:1937100 dr:0 al:0 bm:29 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[...]
23: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Reply via email to