On 06/01/2015 03:56 AM, Simone Tiraboschi wrote:


----- Original Message -----
From: "Douglas Schilling Landgraf" <dougsl...@redhat.com>
To: "Simone Tiraboschi" <stira...@redhat.com>, devel@ovirt.org
Cc: "Fabian Deutsch" <fdeut...@redhat.com>
Sent: Saturday, May 30, 2015 11:28:38 PM
Subject: Re: oVirt node 3.6 and CPU load indefinitely stuck on 100% while vdsmd 
indefinitely tries to restart

On 05/29/2015 06:44 AM, Simone Tiraboschi wrote:
Hi,
I tried to have hosted-engine deploying the engine appliance over oVirt
node. I think it will be quite a common scenario.
I tried with an oVirt node build from yesterday.

Unfortunately I'm not able to conclude the setup cause oVirt node got the
CPU load indefinitely stuck on 100% and so it's almost unresponsive.

The issue seams to be related to vdsmd daemon witch couldn't really start
and so it retries indefinitely using all the available CPU power (it also
runs with niceless -20...).

[root@node36 admin]# grep "Unit vdsmd.service entered failed state."
/var/log/messages  | wc -l
368
It tried 368 times in a row in a few minutes.

With journalctl I can read:
May 29 10:06:45 node36 systemd[1]: Unit vdsmd.service entered failed state.
May 29 10:06:45 node36 systemd[1]: vdsmd.service holdoff time over,
scheduling restart.
May 29 10:06:45 node36 systemd[1]: Stopping Virtual Desktop Server
Manager...
May 29 10:06:45 node36 systemd[1]: Starting Virtual Desktop Server
Manager...
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running mkdirs
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
configure_coredump
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
configure_vdsm_logs
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
wait_for_network
May 29 10:06:45 node36 vdsmd_init_common.sh[13697]: vdsm: Running
run_init_hooks
May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
upgraded_version_check
May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
check_is_configured
May 29 10:06:46 node36 vdsmd_init_common.sh[13697]: vdsm: Running
validate_configuration
May 29 10:06:47 node36 vdsmd_init_common.sh[13697]: vdsm: Running
prepare_transient_repository
May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running
syslog_available
May 29 10:06:49 node36 vdsmd_init_common.sh[13697]: vdsm: Running nwfilter
May 29 10:06:50 node36 vdsmd_init_common.sh[13697]: vdsm: Running dummybr
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
load_needed_modules
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
tune_system
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running
test_space
May 29 10:06:51 node36 vdsmd_init_common.sh[13697]: vdsm: Running test_lo
May 29 10:06:51 node36 systemd[1]: Started Virtual Desktop Server Manager.
May 29 10:06:51 node36 systemd[1]: vdsmd.service: main process exited,
code=exited, status=1/FAILURE
May 29 10:06:51 node36 vdsmd_init_common.sh[13821]: vdsm: Running
run_final_hooks
May 29 10:06:52 node36 systemd[1]: Unit vdsmd.service entered failed state.
May 29 10:06:52 node36 systemd[1]: vdsmd.service holdoff time over,
scheduling restart.
May 29 10:06:52 node36 systemd[1]: Stopping Virtual Desktop Server
Manager...
May 29 10:06:52 node36 systemd[1]: Starting Virtual Desktop Server
Manager...
repeated a lot of times

/var/log/vdsm/vdsm.log is empty.

while
[root@node36 admin]# /usr/share/vdsm/daemonAdapter -0 /dev/null -1
/dev/null -2 /dev/null /usr/share/vdsm/vdsm; echo $?
1


Thanks for the report Simone. From my tests you are facing:

non-root user cannot `from ovirtnode import ovirtfunctions`: permission
denied: '/var/log/ovirt-node.log' and '/var/log/ovirt.log
https://bugzilla.redhat.com/show_bug.cgi?id=1224400

We should handle this bug very soon. The workaround is chmod o+rw in
/var/log/ovirt.log /var/log/ovirt-node.log

OK. I tried
[root@node36 admin]# chmod o+rw /var/log/ovirt.log /var/log/ovirt-node.log

but now I'm getting:
[root@node36 admin]# systemctl status -l vdsmd
vdsmd.service - Virtual Desktop Server Manager
    Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled)
    Active: active (running) since Mon 2015-06-01 07:53:09 UTC; 17s ago
   Process: 4040 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh 
--post-stop (code=exited, status=0/SUCCESS)
   Process: 4049 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh 
--pre-start (code=exited, status=0/SUCCESS)
  Main PID: 4164 (vdsm)
    CGroup: /system.slice/vdsmd.service
            └─4164 /usr/bin/python /usr/share/vdsm/vdsm

Jun 01 07:53:07 node36 vdsmd_init_common.sh[4049]: vdsm: Running nwfilter
Jun 01 07:53:08 node36 vdsmd_init_common.sh[4049]: vdsm: Running dummybr
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running 
load_needed_modules
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running tune_system
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_space
Jun 01 07:53:09 node36 vdsmd_init_common.sh[4049]: vdsm: Running test_lo
Jun 01 07:53:09 node36 systemd[1]: Started Virtual Desktop Server Manager.
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR failed to init clientIF, 
shutting down storage dispatcher
Jun 01 07:53:10 node36 vdsm[4164]: vdsm vds ERROR Exception raised
                                    Traceback (most recent call last):
                                      File "/usr/share/vdsm/vdsm", line 154, in 
run
                                        serve_clients(log)
                                      File "/usr/share/vdsm/vdsm", line 93, in 
serve_clients
                                        cif = clientIF.getInstance(irs, log)
                                      File "/usr/share/vdsm/clientIF.py", line 
166, in getInstance
                                      File "/usr/share/vdsm/clientIF.py", line 
112, in __init__
                                      File "/usr/share/vdsm/clientIF.py", line 
170, in _createAcceptor
                                      File "/usr/share/vdsm/clientIF.py", line 
183, in _createSSLContext
                                      File 
"/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 149, in __init__
                                      File 
"/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 174, in _initContext
                                      File 
"/usr/lib/python2.7/site-packages/vdsm/sslutils.py", line 153, in _loadCertChain
                                      File 
"/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Context.py", line 100, in 
load_cert_chain
                                    SSLError: No such file or directory
Jun 01 07:53:20 node36 vdsm[4164]: vdsm vds ERROR Vm's recovery failed
                                    Traceback (most recent call last):
                                      File "/usr/share/vdsm/clientIF.py", line 
416, in _recoverExistingVms
                                      File "/usr/share/vdsm/caps.py", line 177, 
in __init__
                                      File "/usr/share/vdsm/caps.py", line 209, 
in _getCpuTopology
                                      File "/usr/share/vdsm/caps.py", line 199, 
in _getFreshCapsXMLStr
                                      File 
"/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 162, in get
                                      File 
"/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 99, in 
open_connection
                                      File 
"/usr/lib/python2.7/site-packages/vdsm/utils.py", line 1008, in retry
                                      File 
"/usr/lib64/python2.7/site-packages/libvirt.py", line 105, in openAuth
                                    libvirtError: authentication failed: 
polkit: polkit\56retains_authorization_after_challenge=1
                                    Authorization requires authentication but 
no agent is available.

Was it just a partial workaround or am I facing a different issue?

It should be a different one, I will try to catch this one locally.

--
Cheers
Douglas
_______________________________________________
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Reply via email to