I am new to this forum so I apologize before hand if I don’t present right 
content correctly or miss the content you need. 

Background:  
By no means am I an expert with Ovirt and glusterfs.  That said I have been 
using, managing, building out Ovirt (single hosts) and oVirt with Gluster 
Hyperconverged environments for 5 years or more. 
I started building out Ovirt environments with oVirt Engine Version: 
3.6.7.5-1.el6 and earlier and now I’m using the latest oVirt with Gluster 
Hyperconverged. 

Current hardware and software layout:
For the last 8 months I have been using a oVirt with Gluster Hyperconverged to 
host in total about 100 VM’s. 
My hardware layout in one environment is 5 Dell R410 3 of them configured with 
Gluster Hyperconverged and the other 2 are just added hosts. Below is a 
detailed list!
Manufacturer: Dell Inc. PowerEdge R410
CPU Model Name: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
CPU Cores per Socket: 4
CPU Type: Intel Westmere IBRS SSBD Family
Dell PERC H700
4 SAS Seagate 4 TB drives 7.2k
2 one gig links – NIC 1 for frontend and NIC 2 for gluster backend 

My software layout is: 
OS Version: RHEL - 7 - 7.1908.0.el7.centos
OS Description: CentOS Linux 7 (Core)
Kernel Version: 3.10.0 - 1062.9.1.el7.x86_64
KVM Version: 2.12.0 - 33.1.el7_7.4
LIBVIRT Version: libvirt-4.5.0-23.el7_7.3
VDSM Version: vdsm-4.30.38-1.el7
SPICE Version: 0.14.0 - 7.el7
GlusterFS Version: glusterfs-6.6-1.el7
CEPH Version: librbd1-10.2.5-4.el7
Open vSwitch Version: openvswitch-2.11.0-4.el7
Kernel Features: PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
VNC Encryption: Disabled

My network layout is:
3 HP 3800-48G-4SFP+ Switch (J9576A) running FULL MESH




Issue/timeline: 
•       All 3 of the HP 3800 were rebooted at the same time and were down for 5 
to 10 seconds before they came back up (meaning pingable and responsive). 
•       A little more than 85% (36 or so) of the VM’s I had running all went 
into a pause state, do to and unknow storage error. 
•       The gluster volume heal state went all the way up to 2300 on vmstore 
(OS data location)
•       After heal completed on the vmstore (took about an hour) 85% of the 
VM’s failed to launch with an error (see below).  
    
VM broadsort is down with error. Exit message: Bad volume specification 
{'protocol': 'gluster', 'address': {'function': '0x0', 'bus': '0x00', 'domain': 
'0x0000', 'type': 'pci', 'slot': '0x06'}, 'serial': 
'b1bf3f56-a453-4383-a350-288bee06445b', 'index': 0, 'iface': 'virtio', 
'apparentsize': '274877906944', 'specParams': {}, 'cache': 'none', 'imageID': 
'b1bf3f56-a453-4383-a350-288bee06445b', 'truesize': '106767498240', 'type': 
'disk', 'domainID': 'a7119613-a5ba-4a97-802b-0a985c647381', 'reqsize': '0', 
'format': 'raw', 'poolID': '699fd2d6-c461-11e9-8b83-00163e18a045', 'device': 
'disk', 'path': 
'vmstore/a7119613-a5ba-4a97-802b-0a985c647381/images/b1bf3f56-a453-4383-a350-288bee06445b/25b0ab77-8f4c-42a1-9416-27db4cd25b39',
 'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID': 
'25b0ab77-8f4c-42a1-9416-27db4cd25b39', 'diskType': 'network', 'alias': 
'ua-b1bf3f56-a453-4383-a350-288bee06445b', 'hosts': [{'name': 
'glust01.mydomain.local', 'port': '0'}], 'discard': False}.

Everyone of the VM’s had this same error and I had to find backups and old 
images to bring them back online. I deleted some of the VM’s that I had current 
images of to get them back up.

You shouldn’t be afraid to reboot 1, 2, 3, or even all of your switches at once 
because of a human error, power outage, or a simple update. Then have to worry 
about your VM’s getting corrupted because of this concerns me greatly that I 
didn’t setup oVirt with Gluster Hyperconverged correctly. Have I missed 
something in the documentation or network layout that would prevent this for 
happing? 

I want to thank you for your time and it is greatly appreciated!
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JNAKRCLRZSCJE5HXFEVOO2B4IGXZQ7BV/

Reply via email to