Hi,

We have a oVirt cluster with 4 hosts and hosted engine running on one of them 
(all the nodes provide the storage with GlusterFS)
Currently there are 53 VMs running.
The version of the oVirt-Engine is 4.2.8.2-1.el7 and GlusterFS is 3.12.15.

From past 1 week, we seem to have multiple events popping up on Ovirt-UI about 
the GetGlusterVolumeHealInfoVDS from all the nodes randomly like one ERROR 
event for every ~13minutes.

Sample Event dashboard example:
May 4, 2020, 2:32:14 PM - Status of host <host-1> was set to Up.
May 4, 2020, 2:32:11 PM - Manually synced the storage devices from host <host-1>
May 4, 2020, 2:31:55 PM - Host <host-1> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 2:31:55 PM - VDSM <host-1> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues

May 4, 2020, 2:19:14 PM - Status of host <host-2> was set to Up.
May 4, 2020, 2:19:12 PM - Manually synced the storage devices from host <host-2>
May 4, 2020, 2:18:49 PM - Host <host-2> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 2:18:49 PM - VDSM <host-2> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues

May 4, 2020, 2:05:55 PM - Status of host <host-2> was set to Up.
May 4, 2020, 2:05:54 PM - Manually synced the storage devices from host <host-2>
May 4, 2020, 2:05:35 PM - Host <host-2> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 2:05:35 PM - VDSM <host-2> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues

May 4, 2020, 1:52:45 PM - Status of host <host-3> was set to Up.
May 4, 2020, 1:52:44 PM - Manually synced the storage devices from host <host-3>
May 4, 2020, 1:52:22 PM - Host <host-3> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 1:52:22 PM - VDSM <host-3> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues

May 4, 2020, 1:39:11 PM - Status of host <host-4> was set to Up.
May 4, 2020, 1:39:11 PM - Manually synced the storage devices from host <host-4>
May 4, 2020, 1:39:11 PM - Host <host-4> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 1:39:11 PM - VDSM <host-4> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues

May 4, 2020, 1:26:29 PM - Status of host <host-3> was set to Up.
May 4, 2020, 1:26:28 PM - Manually synced the storage devices from host <host-3>
May 4, 2020, 1:26:11 PM - Host <host-3> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 1:26:11 PM - VDSM <host-3> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues

May 4, 2020, 1:13:10 PM - Status of host <host-1> was set to Up.
May 4, 2020, 1:13:08 PM - Manually synced the storage devices from host <host-1>
May 4, 2020, 1:12:51 PM - Host <host-1> is not responding. Host cannot be 
fenced automatically because power management for the host is disabled.
May 4, 2020, 1:12:51 PM - VDSM <host-1> command GetGlusterVolumeHealInfoVDS 
failed: Message timeout which can be caused by communication issues
 and so on.....

When I look at the Compute > Hosts dashboard, I see the host status to be DOWN 
when VDSM event (GetGlusterVolumeHealInfoVDS failed) is popped and 
automatically the host status is set to UP within no time. 
FYI: when host status is DOWN, the VM's running on that host are not migrating 
and everything is running perfectly fine.

This is happening all day. Is there something I can troubleshoot? Appreciate 
your comments. 
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/GNE3QC7GLEER4ZPHGP3H6M27DPSKCQO3/

Reply via email to