On Wed, 30 Aug 2017 14:18:49 +0300 Dan Kenigsberg <[email protected]> wrote:
> On Wed, Aug 30, 2017 at 1:40 PM, Barak Korren <[email protected]> > wrote: > > Test failed: [ 002_bootstrap.add_hosts ] > > > > Link to suspected patches: > > > > We suspect this is due to change to LLDPAD in upstream CentOS repos. > > We can't tell the exact point it was introduced because other > > ovirt-engine regressions introduced too much noise into the system. > > > > It also seems that the failure is not 100% reproducible sine we have > > runs that do not encounter it. > > > > Link to Job: > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2151 > > > > (This job checked a specific patch to ovirt-log-collector but > > failure seems unrelated, and also happens on vanille 'tested' repo > > right now). > > > > Link to all logs: > > VDSM logs seem to be most relevant here: > > > > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2151/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/vdsm/ > > > > Error snippet from log: > > > > - From VDSM logs (note it exists very quickly): > > > > <error> > > > > 2017-08-30 05:55:52,208-0400 INFO (MainThread) [vds] (PID: 4594) I > > am the actual vdsm 4.20.2-133.git0ce4485.el7.centos > > lago-basic-suite-master-host-0 (3.10.0-514.2.2.el7.x86_64) > > (vdsmd:148) 2017-08-30 05:55:52,209-0400 INFO (MainThread) [vds] > > VDSM will run with cpu affinity: frozenset([1]) (vdsmd:254) > > 2017-08-30 05:55:52,253-0400 INFO (MainThread) [storage.check] > > Starting check service (check:92) > > 2017-08-30 05:55:52,257-0400 INFO (MainThread) [storage.Dispatcher] > > Starting StorageDispatcher... (dispatcher:48) > > 2017-08-30 05:55:52,257-0400 INFO (check/loop) [storage.asyncevent] > > Starting <EventLoop running=True closed=False at 0x45317520> > > (asyncevent:125) > > 2017-08-30 05:55:52,288-0400 INFO (MainThread) [vdsm.api] START > > registerDomainStateChangeCallback(callbackFunc=<functools.partial > > object at 0x2b47b50>) from=internal, > > task_id=2cebe9ef-358e-47d7-81a6-8c4b54b9cd6d (api:46) > > 2017-08-30 05:55:52,288-0400 INFO (MainThread) [vdsm.api] FINISH > > registerDomainStateChangeCallback return=None from=internal, > > task_id=2cebe9ef-358e-47d7-81a6-8c4b54b9cd6d (api:52) > > 2017-08-30 05:55:52,288-0400 INFO (MainThread) [MOM] Preparing MOM > > interface (momIF:53) > > 2017-08-30 05:55:52,289-0400 INFO (MainThread) [MOM] Using named > > unix socket /var/run/vdsm/mom-vdsm.sock (momIF:62) > > 2017-08-30 05:55:52,289-0400 INFO (MainThread) [root] Unregistering > > all secrets (secret:91) > > 2017-08-30 05:55:52,307-0400 INFO (MainThread) [vds] Setting > > channels' timeout to 30 seconds. (vmchannels:221) > > 2017-08-30 05:55:52,309-0400 INFO (vmrecovery) [vds] recovery: > > completed in 0s (clientIF:516) > > 2017-08-30 05:55:52,310-0400 INFO (MainThread) > > [vds.MultiProtocolAcceptor] Listening at :::54321 > > (protocoldetector:196) > > 2017-08-30 05:55:52,496-0400 INFO (http) [vds] Server running > > (http:58) 2017-08-30 05:55:52,742-0400 INFO (periodic/1) > > [vdsm.api] START repoStats(domains=()) from=internal, > > task_id=85768015-8ecb-48e3-9307-f671bfc33c65 (api:46) > > 2017-08-30 05:55:52,743-0400 INFO (periodic/1) [vdsm.api] FINISH > > repoStats return={} from=internal, > > task_id=85768015-8ecb-48e3-9307-f671bfc33c65 (api:52) > > 2017-08-30 05:55:52,743-0400 WARN (periodic/1) [throttled] MOM not > > available. (throttledlog:103) > > 2017-08-30 05:55:52,744-0400 WARN (periodic/1) [throttled] MOM not > > available, KSM stats will be missing. (throttledlog:103) > > 2017-08-30 05:55:55,043-0400 INFO (MainThread) [vds] Received > > signal 15, shutting down (vdsmd:67) > > 2017-08-30 05:55:55,045-0400 INFO (MainThread) > > [jsonrpc.JsonRpcServer] Stopping JsonRPC Server (__init__:759) > > 2017-08-30 05:55:55,049-0400 INFO (MainThread) [vds] Stopping http > > server (http:79) > > 2017-08-30 05:55:55,049-0400 INFO (http) [vds] Server stopped > > (http:69) 2017-08-30 05:55:55,050-0400 INFO (MainThread) [root] > > Unregistering all secrets (secret:91) > > 2017-08-30 05:55:55,052-0400 INFO (MainThread) [vdsm.api] START > > prepareForShutdown(options=None) from=internal, > > task_id=ffde5caa-fa44-49ab-bdd1-df81519680a3 (api:46) > > 2017-08-30 05:55:55,089-0400 INFO (MainThread) [storage.Monitor] > > Shutting down domain monitors (monitor:222) > > 2017-08-30 05:55:55,090-0400 INFO (MainThread) [storage.check] > > Stopping check service (check:105) > > 2017-08-30 05:55:55,090-0400 INFO (check/loop) [storage.asyncevent] > > Stopping <EventLoop running=True closed=False at 0x45317520> > > (asyncevent:220) > > 2017-08-30 05:55:55,090-0400 INFO (MainThread) [vdsm.api] FINISH > > prepareForShutdown return=None from=internal, > > task_id=ffde5caa-fa44-49ab-bdd1-df81519680a3 (api:52) > > 2017-08-30 05:55:55,091-0400 INFO (MainThread) [vds] Stopping > > threads (vdsmd:159) > > 2017-08-30 05:55:55,091-0400 INFO (MainThread) [vds] Exiting > > (vdsmd:170) > > > > </error> > > > > > > - From SuperVDSM logs: > > > > <error> > > > > MainThread::ERROR::2017-08-30 > > 05:55:59,476::initializer::53::root::(_lldp_init) Failed to enable > > LLDP on eth1 > > Traceback (most recent call last): > > File > > "/usr/lib/python2.7/site-packages/vdsm/network/initializer.py", > > line 51, in _lldp_init Lldp.enable_lldp_on_iface(device) > > File > > "/usr/lib/python2.7/site-packages/vdsm/network/lldp/lldpad.py", > > line 30, in enable_lldp_on_iface > > lldptool.enable_lldp_on_iface(iface, rx_only) File > > "/usr/lib/python2.7/site-packages/vdsm/network/lldpad/lldptool.py", > > line 46, in enable_lldp_on_iface raise EnableLldpError(rc, out, > > err, iface) EnableLldpError: (1, > > "timeout\n'M00000001C3040000000c04eth1000badminStatus0002rx' command > > timed out.\n", '', 'eth1') > > MainThread::DEBUG::2017-08-30 > > 05:55:59,477::cmdutils::133::root::(exec_cmd) /usr/sbin/lldptool > > get-lldp -i eth0 adminStatus (cwd None) > > > > > > </error> > > The LLDP errors are due to SELinux, to be fixed in CentOS-7.4.1. They > do not cause supervdsmd to stop, so they are not the reason for vdsm's > failure to start. Something ELSE is doing this. > > in > http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/2151/artifact/exported-artifacts/basic-suit-master-el7/test_logs/basic-suite-master/post-002_bootstrap.py/lago-basic-suite-master-host-0/_var_log/messages/*view*/ > we have > > Aug 30 05:55:52 lago-basic-suite-master-host-0 journal: vdsm throttled > WARN MOM not available. > Aug 30 05:55:52 lago-basic-suite-master-host-0 journal: vdsm throttled > WARN MOM not available, KSM stats will be missing. > Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd: Started > Session 10 of user root. > Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd-logind: New > session 10 of user root. > Aug 30 05:55:53 lago-basic-suite-master-host-0 systemd: Starting > Session 10 of user root. > Aug 30 05:55:53 lago-basic-suite-master-host-0 python: ansible-setup > Invoked with filter=* gather_subset=['all'] > fact_path=/etc/ansible/facts.d gather_timeout=10 > Aug 30 05:55:54 lago-basic-suite-master-host-0 python: ansible-command > Invoked with warn=True executable=None _uses_shell=False > _raw_params=bash -c "rpm -qi vdsm | grep -oE > 'Version\\s+:\\s+[0-9\\.]+' | awk '{print $3}'" removes=None > creates=None chdir=None > Aug 30 05:55:54 lago-basic-suite-master-host-0 python: ansible-systemd > Invoked with no_block=False name=libvirt-guests enabled=True > daemon_reload=False state=started user=False masked=None > Aug 30 05:55:54 lago-basic-suite-master-host-0 systemd: Reloading. > Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Cannot add > dependency job for unit lvm2-lvmetad.socket, ignoring: Unit is masked. > Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopped MOM > instance configured for VDSM purposes. > Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopping > Virtual Desktop Server Manager... > Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Starting > Suspend Active Libvirt Guests... > Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Started > Suspend Active Libvirt Guests. > Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: libvirt > version: 2.0.0, package: 10.el7_3.9 (CentOS BuildSystem > <http://bugs.centos.org>, 2017-05-25-20:52:28, c1bm.rdu2.centos.org) > Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: hostname: > lago-basic-suite-master-host-0.lago.local > Aug 30 05:55:55 lago-basic-suite-master-host-0 vdsmd_init_common.sh: > vdsm: Running run_final_hooks > Aug 30 05:55:55 lago-basic-suite-master-host-0 journal: End of file > while reading data: Input/output error > Aug 30 05:55:55 lago-basic-suite-master-host-0 systemd: Stopped > Virtual Desktop Server Manager. > > suggesting that systemd stopped vdsmd since mom did not start on time. > We have an extremely ugly two-way dependency between the two services. > Each depends on the other one. I am guessing that systemd's grace > period for such a cycle has expired, and the couple was taken down. > > Does anybody have a guess what could cause such slowlness? Even if the LLDP errors does not have a negative impact of the functional behavior of supervdsmd, they have a impact timing behavior of the initialization of supervdsmd. I will post a patch today which moves the initialization of LLDP in an extra thread, to make the initialization of LLDP asynchronly. _______________________________________________ Devel mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/devel
