adietrich-ussignal opened a new issue, #9525:
URL: https://github.com/apache/cloudstack/issues/9525
<!--
Verify first that your issue/request is not already reported on GitHub.
Also test if the latest release and main branch are affected too.
Always add information AFTER of these HTML comments, but no need to delete
the comments.
-->
##### ISSUE TYPE
<!-- Pick one below and delete the rest -->
* Bug Report
##### COMPONENT NAME
<!--
Categorize the issue, e.g. API, VR, VPN, UI, etc.
-->
~~~
Agent
~~~
##### CLOUDSTACK VERSION
<!--
New line separated list of affected versions, commit ID for issues on main
branch.
-->
~~~
4.19.1.1
~~~
##### CONFIGURATION
<!--
Information about the configuration if relevant, e.g. basic network,
advanced networking, etc. N/A otherwise
-->
Advanced Networking
KVM
Consolidated Cluster:
- 2 KVM Hosts: Management Server, Agent, & Database
- 1 KVM Host: Agent
##### OS / ENVIRONMENT
<!--
Information about the environment if relevant, N/A otherwise
-->
Ubuntu 22.04
##### SUMMARY
<!-- Explain the problem/feature briefly -->
When performing a host reboot after updating general packages (e.g., kernel
updates), the agent fails to return to an "Up" state.
The error observed when attempting to disable maintnenance mode:
<img width="285" alt="Screenshot 2024-08-13 at 2 19 52 PM"
src="https://github.com/user-attachments/assets/d02f7e50-729a-4d26-b739-0c47258fe877">
The agent logs:
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo]
(Agent-Handler-1:) (logid:) Fetching CPU speed from command "lscpu".
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo]
(Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from lscpu.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.lang.NullPointerException
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.lang.Float.parseFloat(Float.java:455)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:134)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:109)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.<init>(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo]
(Agent-Handler-1:) (logid:) Fetching CPU speed from file
[/sys/devices/system/cpu/cpu0/cpufreq/base_frequency].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: ERROR [utils.linux.KVMHostInfo]
(Agent-Handler-1:) (logid:) Unable to retrieve the CPU speed from file
[/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: java.io.FileNotFoundException:
/sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.io.FileInputStream.open0(Native Method)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.io.FileInputStream.open(FileInputStream.java:219)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.io.FileInputStream.<init>(FileInputStream.java:112)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.io.FileReader.<init>(FileReader.java:60)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:145)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:114)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:189)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
org.apache.cloudstack.utils.linux.KVMHostInfo.<init>(KVMHostInfo.java:66)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3627)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.agent.Agent.sendStartup(Agent.java:460)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1125)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.utils.nio.Task.call(Task.java:83)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
com.cloud.utils.nio.Task.call(Task.java:29)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: at
java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo]
(Agent-Handler-1:) (logid:) Fetching CPU speed from "host capabilities"
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [utils.linux.KVMHostInfo]
(Agent-Handler-1:) (logid:) Retrieved value [2100000000] from "host
capabilities". This corresponds to a CPU speed of [2100] MHz.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Host uses
control group [cgroup2fs].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) Calculating
the max shares of the host.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.resource.LibvirtComputingResource] (Agent-Handler-1:) (logid:) The max
shares of the host is [268800].
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: root : PWD=/ ; USER=root ;
COMMAND=/usr/bin/grep InitiatorName= /etc/iscsi/initiatorname.iscsi
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session
opened for user root(uid=0) by (uid=0)
Aug 13 14:17:27 dtw-cm-kvm01-03 sudo[7077]: pam_unix(sudo:session): session
closed for user root
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script]
(Agent-Handler-1:) (logid:) Execution of process [7107] for command [/bin/bash
-c dpkg -l nbdkit ] failed.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: WARN [utils.script.Script]
(Agent-Handler-1:) (logid:) Process [7107] for command [/bin/bash -c dpkg -l
nbdkit ] encountered the error: [1].
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Attempting to
create storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 (Filesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Found existing
defined storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9, using it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:) (logid:) Trying to fetch
storage pool 5a0edcc5-002b-4f39-84ef-b8c7e8486dc9 from libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent]
(Agent-Handler-2:) (logid:) Process agent startup answer, agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent]
(Agent-Handler-2:) (logid:) Set agent id 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO [cloud.agent.Agent]
(Agent-Handler-2:) (logid:) Startup Response Received: agent id = 0
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac)
Attempting to create storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6
(NetworkFilesystem) in libvirt
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac)
Found existing defined storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6, using
it.
Aug 13 14:17:27 dtw-cm-kvm01-03 java[3077]: INFO
[kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-1:) (logid:47b167ac)
Trying to fetch storage pool 456fe424-034a-3bf4-b0b5-5b5b30ed07f6 from libvirt
The management server log:
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]:
com.cloud.utils.exception.CloudRuntimeException: Could not restart agent on
Host
{"id":17,"name":"dtw-cm-kvm01-03","type":"Routing","uuid":"2549c995-78ca-411f-91f8-7a4863fe3d15"}
due to: Failed to restart cloudstack-agent.service: Interactive authentication
required.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: See system logs and
'systemctl status cloudstack-agent.service' for details.
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.resource.ResourceManagerImpl.connectAndRestartAgentOnHost(ResourceManagerImpl.java:2958)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.resource.ResourceManagerImpl.handleAgentIfNotConnected(ResourceManagerImpl.java:2923)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2893)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2977)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1294)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/java.lang.reflect.Method.invoke(Method.java:566)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.sun.proxy.$Proxy200.cancelMaintenance(Unknown Source)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:96)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:172)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:112)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:654)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:602)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: at
java.base/java.lang.Thread.run(Thread.java:829)
Aug 13 14:18:59 dtw-cm-kvm01-01 java[2254523]: INFO
[o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-12:ctx-d6de7f78 job-720)
(logid:cb386b6d) Remove job-720 from job monitoring
##### STEPS TO REPRODUCE
<!--
For bugs, show exactly how to reproduce the problem, using a minimal
test-case. Use Screenshots if accurate.
For new features, show how the feature would be used.
-->
<!-- Paste example playbooks or commands between quotes below -->
~~~
1. Place KVM host into maintenance mode
2. Perform package updates on the KVM host
3. Reboot the KVM host
~~~
<!-- You can also paste gist.github.com links for larger files -->
##### EXPECTED RESULTS
<!-- What did you expect to happen when running the steps above? -->
~~~
The agent status changes to "Up" after the reboot has been completed and the
agent is started.
~~~
##### ACTUAL RESULTS
<!-- What actually happened? -->
<!-- Paste verbatim command output between quotes below -->
~~~
The agent status is "Alert".
~~~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]