----- Original Message ----- > From: "Nir Soffer" <[email protected]> > To: "Simone Tiraboschi" <[email protected]> > Cc: [email protected], "Fabian Deutsch" <[email protected]> > Sent: Friday, May 29, 2015 5:45:48 PM > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck on > 100% while vdsmd indefinitely tries to > restart > > > > ----- Original Message ----- > > From: "Simone Tiraboschi" <[email protected]> > > To: "Nir Soffer" <[email protected]> > > Cc: [email protected], "Fabian Deutsch" <[email protected]> > > Sent: Friday, May 29, 2015 6:42:08 PM > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > on 100% while vdsmd indefinitely tries to > > restart > > > > > > > > ----- Original Message ----- > > > From: "Nir Soffer" <[email protected]> > > > To: "Simone Tiraboschi" <[email protected]> > > > Cc: [email protected], "Fabian Deutsch" <[email protected]> > > > Sent: Friday, May 29, 2015 5:26:52 PM > > > Subject: Re: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > > on 100% while vdsmd indefinitely tries to > > > restart > > > > > > ----- Original Message ----- > > > > From: "Simone Tiraboschi" <[email protected]> > > > > To: [email protected] > > > > Cc: "Fabian Deutsch" <[email protected]> > > > > Sent: Friday, May 29, 2015 1:44:02 PM > > > > Subject: [ovirt-devel] oVirt node 3.6 and CPU load indefinitely stuck > > > > on > > > > 100% while vdsmd indefinitely tries to > > > > restart > > > > > > > > Hi, > > > > I tried to have hosted-engine deploying the engine appliance over oVirt > > > > node. > > > > I think it will be quite a common scenario. > > > > I tried with an oVirt node build from yesterday. > > > > > > > > Unfortunately I'm not able to conclude the setup cause oVirt node got > > > > the > > > > CPU > > > > load indefinitely stuck on 100% and so it's almost unresponsive. > > > > > > > > The issue seams to be related to vdsmd daemon witch couldn't really > > > > start > > > > and > > > > so it retries indefinitely using all the available CPU power (it also > > > > runs > > > > with niceless -20...). > > > > > > > > [root@node36 admin]# grep "Unit vdsmd.service entered failed state." > > > > /var/log/messages | wc -l > > > > 368 > > > > It tried 368 times in a row in a few minutes. > > > > > > > > With journalctl I can read: > > > > May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed > > > > state. > > > > May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over, > > > > scheduling restart. > > > > May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server > > > > Manager... > > > > May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server > > > > Manager... > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > mkdirs > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > configure_coredump > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > configure_vdsm_logs > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > wait_for_network > > > > May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > run_init_hooks > > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > upgraded_version_check > > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > check_is_configured > > > > May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > validate_configuration > > > > May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > prepare_transient_repository > > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > syslog_available > > > > May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > nwfilter > > > > May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > dummybr > > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > load_needed_modules > > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > tune_system > > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > test_space > > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running > > > > test_lo > > > > May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server > > > > Manager. > > > > May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited, > > > > code=exited, status=1/FAILURE > > > > May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running > > > > run_final_hooks > > > > May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed > > > > state. > > > > May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over, > > > > scheduling restart. > > > > May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server > > > > Manager... > > > > May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server > > > > Manager... > > > > repeated a lot of times > > > > > > > > /var/log/vdsm/vdsm.log is empty. > > > > > > > > while > > > > [root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1 > > > > /dev/null > > > > -2 /dev/null /usr/share/vdsm/vdsm; echo $? > > > > 1 > > > > > > Can you try to run vdsm manually from the shell? > > > > > > # /usr/share/vdsm/vdsm > > > > > > Typically you would see a python traceback explaining the failure. > > > > I tried and it just fails. > > Exit code is 1 > > Can show strace of the failure? > > # strace /usr/share/vdsm/vdsm
It's getting a lot of
stat("/usr/share/vdsm/virt/caps", 0x7fff12cba270) = -1 ENOENT (No such file or
directory)
open("/usr/share/vdsm/virt/caps.so", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/usr/share/vdsm/virt/capsmodule.so", O_RDONLY) = -1 ENOENT (No such file
or directory)
open("/usr/share/vdsm/virt/caps.py", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/usr/share/vdsm/virt/caps.pyc", O_RDONLY) = -1 ENOENT (No such file or
directory)
on almost all the modules.
I'm attaching the full strace.
thanks,
Simone
vdsm_strace.gz
Description: GNU Zip compressed data
_______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
