On 27-03-2015 7:03, Dan Kenigsberg wrote:
On Thu, Mar 26, 2015 at 06:16:24PM -0300, Christopher Pereira wrote:
Continuing with the 3.6 Night Builds testing...
While hosted-engine-setup was adding the host to the newly created cluster,
VDSM crashed, probably because the gluster engine storage disappeared as in
BZ 1201355 [1]
Facts:
- the engine storage (/rhev/data-center/mmt/...) was umounted during
this process
- another mount of the same volume was still mounted after the VDSM
crash (maybe the problem is not related with gluster)
What exactly happened to vdsm? Did the process die? Why? Was it stopped?
did it segfault? Did it stop responding? Can you share vdsm.log and
/var/log/message showing what happened during the crash?
Hi Dan,
You will find relevants logs here:
https://bugzilla.redhat.com/show_bug.cgi?id=1201355#c4
Summary:
1) During setup, VDSM receives a SIGTERM:
MainThread::DEBUG::2015-03-26
18:36:56,767::vdsm::66::vds::(sigtermHandler) Received signal 15
Maybe the activation process installs VDSM and/or restarts it.
2) Since the gluster storage is mounted from a VDSM ChildProcess, it
disappears when VDSM stops.
Thus, the VM is paused and will never resume (even after remounting the
storage, because the paused QEMU process keeps invalid file descriptors):
https://bugzilla.redhat.com/show_bug.cgi?id=1058300
https://bugzilla.redhat.com/show_bug.cgi?id=1172905
3) After the VDSM stopped, it's not possible to restart it since you
will get an "invalid lockspace" in sanlock.
This can be solved with hosted-engine --start-pool.
4) You will be able to reproduce the VDSM sigterm with less effort (no
need to re-deploy) by accessing the engine portal and reactivating the host.
You will see that VDSM gets stopped and the storage lost.
As a workarround to avoid the storage to get lost, you can mount it
manually so that it doesn't relay on the VDSM ChildProcess.
Questions:
1) I'm affraid that by activating the host manually after an interrupted
setup I may be skipping some special configurations.
Is there any difference between activating the host manually from the
web-manager and activating the host with the setup script?
How can I complete the setup manually?
Status:
I'm still unable to activate the host manually, because engine is now
having problems with the JsonRPC communcation:
2015-03-27 10:11:54,889 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
Reactor) [] Connecting to h2.imatronix.com/209.126.105.36
2015-03-27 10:11:54,893 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor)
[] *Unable to process messages*
2015-03-27 10:11:54,893 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.ListVDSCommand]
(DefaultQuartzScheduler_Worker-96) [] Command
'ListVDSCommand(HostName = h2, HostId =
46d4659a-4efe-4427-aa68-a4536508fa08,
vds=Host[h2,46d4659a-4efe-4427-aa68-a4536508fa08])' execution
failed: VDSGenericException: VDSNetworkException: General SSLEngine
problem
2015-03-27 10:11:54,894 ERROR
[org.ovirt.engine.core.utils.timer.SchedulerUtilQuartzImpl]
(DefaultQuartzScheduler_Worker-96) [] Failed to invoke scheduled
method vmsMonitoring: null
2015-03-27 10:11:57,894 INFO
[org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp
Reactor) [] Connecting to h2.imatronix.com/209.126.105.36
2015-03-27 10:11:57,897 ERROR
[org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor)
[] *Unable to process messages*
2015-03-27 10:11:57,897 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler_Worker-95)
[] Command
'org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand' return
value
'org.ovirt.engine.core.vdsbroker.vdsbroker.VDSInfoReturnForXmlRpc@79313585'
2015-03-27 10:11:57,898 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler_Worker-95)
[] HostName = h2
2015-03-27 10:11:57,898 ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler_Worker-95)
[] Command 'GetCapabilitiesVDSCommand(HostName = h2, HostId =
46d4659a-4efe-4427-aa68-a4536508fa08,
vds=Host[h2,46d4659a-4efe-4427-aa68-a4536508fa08])' execution
failed: VDSGenericException: VDSNetworkException: *General SSLEngine
problem*
2015-03-27 10:11:57,898 ERROR
[org.ovirt.engine.core.vdsbroker.HostMonitoring]
(DefaultQuartzScheduler_Worker-95) [] Failure to refresh Vds runtime
info: VDSGenericException: VDSNetworkException: General SSLEngine
problem
2015-03-27 10:11:57,898 ERROR
[org.ovirt.engine.core.vdsbroker.HostMonitoring]
(DefaultQuartzScheduler_Worker-95) [] Exception:
org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException:
VDSGenericException: VDSNetworkException: General SSLEngine problem
at
org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase.proceedProxyReturnValue(BrokerCommandBase.java:183)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand.executeVdsBrokerCommand(GetCapabilitiesVDSCommand.java:16)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:55)
[vdsbroker.jar:]
at
org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:33)
[dal.jar:]
at
org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:465)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:587)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.HostMonitoring.refreshVdsRunTimeInfo(HostMonitoring.java:111)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.HostMonitoring.refresh(HostMonitoring.java:76)
[vdsbroker.jar:]
at
org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:199)
[vdsbroker.jar:]
at sun.reflect.GeneratedMethodAccessor39.invoke(Unknown
Source) [:1.7.0_75]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[rt.jar:1.7.0_75]
at java.lang.reflect.Method.invoke(Method.java:606)
[rt.jar:1.7.0_75]
at
org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:81)
[scheduler.jar:]
at
org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:52)
[scheduler.jar:]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
[quartz.jar:]
at
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
[quartz.jar:]
2015-03-27 10:11:57,899 WARN
[org.ovirt.engine.core.vdsbroker.VdsManager]
(DefaultQuartzScheduler_Worker-95) [] Failed to refresh VDS, network
error, continuing, vds='h2'(46d4659a-4efe-4427-aa68-a4536508fa08):
VDSGenericException: VDSNetworkException: *General SSLEngine problem*
[...]
On the VDSM side, we have:
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::592::Storage.TaskManager.Task::(_updateState)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::moving from state init
-> state preparing
clientIFinit::INFO::2015-03-27
10:11:53,098::logUtils::48::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList(options=None)
clientIFinit::INFO::2015-03-27
10:11:53,098::logUtils::51::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList, Return response: {'poollist': []}
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::1188::Storage.TaskManager.Task::(prepare)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::finished: {'poollist': []}
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::592::Storage.TaskManager.Task::(_updateState)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::moving from state
preparing -> state finished
clientIFinit::DEBUG::2015-03-27
10:11:53,098::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
clientIFinit::DEBUG::2015-03-27
10:11:53,098::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
clientIFinit::DEBUG::2015-03-27
10:11:53,098::task::990::Storage.TaskManager.Task::(_decref)
Task=`87ed5b66-3abb-4edc-aec3-59f071b33276`::ref 0 aborting False
Detector thread::DEBUG::2015-03-27
10:11:53,450::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
*Adding connection 209.239.124.8:54218*
Detector thread::DEBUG::2015-03-27
10:11:53,459::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
*Error during handshake: sslv3 alert certificate unknown*
Detector thread::DEBUG::2015-03-27
10:11:53,459::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 209.239.124.8:54218
Detector thread::DEBUG::2015-03-27
10:11:55,249::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
Adding connection 209.126.113.73:54119
Detector thread::DEBUG::2015-03-27
10:11:55,252::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
Error during handshake: unexpected eof
Detector thread::DEBUG::2015-03-27
10:11:55,252::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 209.126.113.73:54119
Detector thread::DEBUG::2015-03-27
10:11:56,582::protocoldetector::201::vds.MultiProtocolAcceptor::(_add_connection)
Adding connection 209.239.124.8:39606
Detector thread::DEBUG::2015-03-27
10:11:56,629::protocoldetector::225::vds.MultiProtocolAcceptor::(_process_handshake)
Error during handshake: sslv3 alert certificate unknown
Detector thread::DEBUG::2015-03-27
10:11:56,629::protocoldetector::215::vds.MultiProtocolAcceptor::(_remove_connection)
Removing connection 209.239.124.8:39606
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::592::Storage.TaskManager.Task::(_updateState)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::moving from state init
-> state preparing
clientIFinit::INFO::2015-03-27
10:11:58,104::logUtils::48::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList(options=None)
clientIFinit::INFO::2015-03-27
10:11:58,104::logUtils::51::dispatcher::(wrapper) Run and protect:
getConnectedStoragePoolsList, Return response: {'poollist': []}
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::1188::Storage.TaskManager.Task::(prepare)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::finished: {'poollist': []}
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::592::Storage.TaskManager.Task::(_updateState)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::moving from state
preparing -> state finished
clientIFinit::DEBUG::2015-03-27
10:11:58,104::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
clientIFinit::DEBUG::2015-03-27
10:11:58,104::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
clientIFinit::DEBUG::2015-03-27
10:11:58,104::task::990::Storage.TaskManager.Task::(_decref)
Task=`9e6db6dc-3ce0-4e93-8ddd-2aa1d09fa687`::ref 0 aborting False
[...]
I guess this is related to an invalid certificate or some protocol
version missmatch.
How can I fix it?
Regards,
Christopher
_______________________________________________
Devel mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/devel