csquire opened a new issue #2890: KVMHAMonitor thread blocks indefinitely while 
NFS not available
URL: https://github.com/apache/cloudstack/issues/2890
 
 
   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and master branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
   <!-- Pick one below and delete the rest -->
    * Bug Report
   
   
   ##### COMPONENT NAME
   <!--
   Categorize the issue, e.g. API, VR, VPN, UI, etc.
   -->
   ~~~
   KVM Agent
   ~~~
   
   ##### CLOUDSTACK VERSION
   <!--
   New line separated list of affected versions, commit ID for issues on master 
branch.
   -->
   
   ~~~
   4.11.2.0-41120rc2
   ~~~
   
   ##### CONFIGURATION
   <!--
   Information about the configuration if relevant, e.g. basic network, 
advanced networking, etc.  N/A otherwise
   -->
   
   
   ##### OS / ENVIRONMENT
   <!--
   Information about the environment if relevant, N/A otherwise
   -->
   
   
   ##### SUMMARY
   <!-- Explain the problem/feature briefly -->
   Also see comment thread on PR #2722
   
   We installed an RC release which includes PR #2722 on a test system 
expecting the host to get marked as `Disconnected` after using iptables to drop 
NFS requests, but instead the host gets marked as  `Down`. My investigation 
shows that the line [`storage = 
conn.storagePoolLookupByUUIDString(uuid);`](https://github.com/apache/cloudstack/blob/4.11/plugins/hypervisors/kvm/src/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java#L95)
 blocks indefinitely. So, kvmheartbeat.sh is never executed, a host 
investigation is started, the host with blocked NFS is marked as Down and 
finally all VMs on that host are rescheduled and result in duplicate VMs.
   
   I pulled a thread dump and found the KVMHAMonitor thread will hang here 
until NFS is unblocked.
   
     ``` java.lang.Thread.State: RUNNABLE
           at com.sun.jna.Native.invokePointer(Native Method)
           at com.sun.jna.Function.invokePointer(Function.java:470)
           at com.sun.jna.Function.invoke(Function.java:404)
           at com.sun.jna.Function.invoke(Function.java:315)
           at com.sun.jna.Library$Handler.invoke(Library.java:212)
           at com.sun.proxy.$Proxy3.virStoragePoolLookupByUUIDString(Unknown 
Source)
           at org.libvirt.Connect.storagePoolLookupByUUIDString(Unknown Source)
           at 
com.cloud.hypervisor.kvm.resource.KVMHAMonitor$Monitor.runInContext(KVMHAMonitor.java:95)
           - locked <1afb3370> (a java.util.concurrent.ConcurrentHashMap)
           at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
           at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
           at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
           at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
           at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
           at java.lang.Thread.run(Thread.java:748)
   
      Locked ownable synchronizers:
           - None
   ```
   
   ##### STEPS TO REPRODUCE
   <!--
   For bugs, show exactly how to reproduce the problem, using a minimal 
test-case. Use Screenshots if accurate.
   
   For new features, show how the feature would be used.
   -->
   
   <!-- Paste example playbooks or commands between quotes below -->
   ~~~
   
   ~~~
   
   <!-- You can also paste gist.github.com links for larger files -->
   
   ##### EXPECTED RESULTS
   <!-- What did you expect to happen when running the steps above? -->
   
   ~~~
   The host still runs kvmheartbeat.sh and shows as `Disconnected`
   ~~~
   
   ##### ACTUAL RESULTS
   <!-- What actually happened? -->
   
   <!-- Paste verbatim command output between quotes below -->
   ~~~
   The host heartbeat hangs and get marked as `Down` via host investigation
   ~~~
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to