This PR doesn't seem to completely fix the reboot problem. We installed the RC release with this PR on a test system and are still able to get the KVM host to reboot by using iptables to drop outgoing requests to NFS. My investigation shows that the line [`storage = conn.storagePoolLookupByUUIDString(uuid);`](https://github.com/apache/cloudstack/blob/4.11/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java#L95) blocks indefinitely. So, `kvmheartbeat.sh` is never executed, a host investigation is started, the host with blocked NFS is marked as `Down` and finally all VMs on that host are rescheduled and result in duplicate VMs.
I pulled a thread dump and found the KVMHAMonitor thread will hang here until NFS is unblocked, didn't dig any deeper yet though. ```"Thread-20" - Thread t@135 java.lang.Thread.State: RUNNABLE at com.sun.jna.Native.invokePointer(Native Method) at com.sun.jna.Function.invokePointer(Function.java:470) at com.sun.jna.Function.invoke(Function.java:404) at com.sun.jna.Function.invoke(Function.java:315) at com.sun.jna.Library$Handler.invoke(Library.java:212) at com.sun.proxy.$Proxy3.virStoragePoolLookupByUUIDString(Unknown Source) at org.libvirt.Connect.storagePoolLookupByUUIDString(Unknown Source) at com.cloud.hypervisor.kvm.resource.KVMHAMonitor$Monitor.runInContext(KVMHAMonitor.java:95) - locked <1afb3370> (a java.util.concurrent.ConcurrentHashMap) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None``` [ Full content available at: https://github.com/apache/cloudstack/pull/2722 ] This message was relayed via gitbox.apache.org for devnull@infra.apache.org