I am new to this forum so I apologize before hand if I don’t present right content correctly or miss the content you need.
Background: By no means am I an expert with Ovirt and glusterfs. That said I have been using, managing, building out Ovirt (single hosts) and oVirt with Gluster Hyperconverged environments for 5 years or more. I started building out Ovirt environments with oVirt Engine Version: 3.6.7.5-1.el6 and earlier and now I’m using the latest oVirt with Gluster Hyperconverged. Current hardware and software layout: For the last 8 months I have been using a oVirt with Gluster Hyperconverged to host in total about 100 VM’s. My hardware layout in one environment is 5 Dell R410 3 of them configured with Gluster Hyperconverged and the other 2 are just added hosts. Below is a detailed list! Manufacturer: Dell Inc. PowerEdge R410 CPU Model Name: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz CPU Cores per Socket: 4 CPU Type: Intel Westmere IBRS SSBD Family Dell PERC H700 4 SAS Seagate 4 TB drives 7.2k 2 one gig links – NIC 1 for frontend and NIC 2 for gluster backend My software layout is: OS Version: RHEL - 7 - 7.1908.0.el7.centos OS Description: CentOS Linux 7 (Core) Kernel Version: 3.10.0 - 1062.9.1.el7.x86_64 KVM Version: 2.12.0 - 33.1.el7_7.4 LIBVIRT Version: libvirt-4.5.0-23.el7_7.3 VDSM Version: vdsm-4.30.38-1.el7 SPICE Version: 0.14.0 - 7.el7 GlusterFS Version: glusterfs-6.6-1.el7 CEPH Version: librbd1-10.2.5-4.el7 Open vSwitch Version: openvswitch-2.11.0-4.el7 Kernel Features: PTI: 1, IBRS: 0, RETP: 1, SSBD: 3 VNC Encryption: Disabled My network layout is: 3 HP 3800-48G-4SFP+ Switch (J9576A) running FULL MESH Issue/timeline: • All 3 of the HP 3800 were rebooted at the same time and were down for 5 to 10 seconds before they came back up (meaning pingable and responsive). • A little more than 85% (36 or so) of the VM’s I had running all went into a pause state, do to and unknow storage error. • The gluster volume heal state went all the way up to 2300 on vmstore (OS data location) • After heal completed on the vmstore (took about an hour) 85% of the VM’s failed to launch with an error (see below). VM broadsort is down with error. Exit message: Bad volume specification {'protocol': 'gluster', 'address': {'function': '0x0', 'bus': '0x00', 'domain': '0x0000', 'type': 'pci', 'slot': '0x06'}, 'serial': 'b1bf3f56-a453-4383-a350-288bee06445b', 'index': 0, 'iface': 'virtio', 'apparentsize': '274877906944', 'specParams': {}, 'cache': 'none', 'imageID': 'b1bf3f56-a453-4383-a350-288bee06445b', 'truesize': '106767498240', 'type': 'disk', 'domainID': 'a7119613-a5ba-4a97-802b-0a985c647381', 'reqsize': '0', 'format': 'raw', 'poolID': '699fd2d6-c461-11e9-8b83-00163e18a045', 'device': 'disk', 'path': 'vmstore/a7119613-a5ba-4a97-802b-0a985c647381/images/b1bf3f56-a453-4383-a350-288bee06445b/25b0ab77-8f4c-42a1-9416-27db4cd25b39', 'propagateErrors': 'off', 'name': 'vda', 'bootOrder': '1', 'volumeID': '25b0ab77-8f4c-42a1-9416-27db4cd25b39', 'diskType': 'network', 'alias': 'ua-b1bf3f56-a453-4383-a350-288bee06445b', 'hosts': [{'name': 'glust01.mydomain.local', 'port': '0'}], 'discard': False}. Everyone of the VM’s had this same error and I had to find backups and old images to bring them back online. I deleted some of the VM’s that I had current images of to get them back up. You shouldn’t be afraid to reboot 1, 2, 3, or even all of your switches at once because of a human error, power outage, or a simple update. Then have to worry about your VM’s getting corrupted because of this concerns me greatly that I didn’t setup oVirt with Gluster Hyperconverged correctly. Have I missed something in the documentation or network layout that would prevent this for happing? I want to thank you for your time and it is greatly appreciated! _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/JNAKRCLRZSCJE5HXFEVOO2B4IGXZQ7BV/