This PR doesn't seem to completely fix the reboot problem. We installed the RC
release with this PR on a test system and are still able to get the KVM host to
reboot by using iptables to drop outgoing requests to NFS. My investigation
shows that the line [`storage =
conn.storagePoolLookupByUUIDString(uuid);`](https://github.com/apache/cloudstack/blob/4.11/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java#L95)
blocks indefinitely. So, `kvmheartbeat.sh` is never executed, a host
investigation is started, the host with blocked NFS is marked as `Down` and
finally all VMs on that host are rescheduled and result in duplicate VMs.
I pulled a thread dump and found the KVMHAMonitor thread will hang here until
NFS is unblocked, didn't dig any deeper yet though.
```"Thread-20" - Thread t@135
java.lang.Thread.State: RUNNABLE
at com.sun.jna.Native.invokePointer(Native Method)
at com.sun.jna.Function.invokePointer(Function.java:470)
at com.sun.jna.Function.invoke(Function.java:404)
at com.sun.jna.Function.invoke(Function.java:315)
at com.sun.jna.Library$Handler.invoke(Library.java:212)
at com.sun.proxy.$Proxy3.virStoragePoolLookupByUUIDString(Unknown
Source)
at org.libvirt.Connect.storagePoolLookupByUUIDString(Unknown Source)
at
com.cloud.hypervisor.kvm.resource.KVMHAMonitor$Monitor.runInContext(KVMHAMonitor.java:95)
- locked <1afb3370> (a java.util.concurrent.ConcurrentHashMap)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None```
[ Full content available at: https://github.com/apache/cloudstack/pull/2722 ]
This message was relayed via gitbox.apache.org for [email protected]