saffronjam commented on issue #7829: URL: https://github.com/apache/cloudstack/issues/7829#issuecomment-1680235884
Hi, I believe we have found the issue, and I would just like to report what happened. In the tests I used 1.24.0 and 1.27.3 Kubernetes versions. These existed in two secondary storages such as: - Storage 1: **1.24.0** - Storage 2: **1.24.0** & **1.27.3** The issue was the Storage 1 was behaving incorrectly, either cause of permissions or networking issues, which caused a node to the stuck in the mounting process if it decided to use that storage (which according to @weizhouapache above was a random process) So, when I tried to create clusters with 1.24.0 in my tests, those hypervisors that decided to bind the 1.24.0 from storage 1 would get stuck and therefore, until restart or unmounted, any other cluster that needed the binaries to mount on that hypervisor could fail. Hence the reason it could fail on other version that did not exist on the faulty secondary storage. The solution was to remove the storage and make sure to unmount it on every hypervisor. Now, I do believe this is users' fault and that NFS should be assumed to be correctly configured, but perhaps we should look over some sort solution where I could have been informed, such as a log entry in the management server "Failed to mount ISO on hypervisor. Server <address> might be misbehaving". Anyhow, we can close this issue, and I'd like to thank you guys for all the help. While it was not an issue with CloudStack, it gave a great insight to the internal machinery and we'll be able to troubleshoot much easier in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
