saffronjam commented on issue #7829:
URL: https://github.com/apache/cloudstack/issues/7829#issuecomment-1680235884

   Hi,
   
   I believe we have found the issue, and I would just like to report what 
happened.
   
   In the tests I used 1.24.0 and 1.27.3 Kubernetes versions.
   
   These existed in two secondary storages such as:
   
   - Storage 1: **1.24.0**
   - Storage 2: **1.24.0** & **1.27.3**
   
   The issue was the Storage 1 was behaving incorrectly, either cause of 
permissions or networking issues, which caused a node to the stuck in the 
mounting process if it decided to use that storage (which according to 
@weizhouapache above was a random process)
   
   So, when I tried to create clusters with 1.24.0 in my tests, those 
hypervisors that decided to bind the 1.24.0 from storage 1 would get stuck and 
therefore, until restart or unmounted, any other cluster that needed the 
binaries to mount on that hypervisor could fail.  Hence the reason it could 
fail on other version that did not exist on the faulty secondary storage.
   
   The solution was to remove the storage and make sure to unmount it on every 
hypervisor.
   
   Now, I do believe this is users' fault and that NFS should be assumed to be 
correctly configured, but perhaps we should look over some sort solution where 
I could have been informed, such as a log entry in the management server 
"Failed to mount ISO on hypervisor. Server <address> might be misbehaving". 
   
   Anyhow, we can close this issue, and I'd like to thank you guys for all the 
help. While it was not an issue with CloudStack, it gave a great insight to the 
internal machinery and we'll be able to troubleshoot much easier in the future. 
 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to