On Mon, Jul 23, 2018 at 1:32 PM, Dafna Ron <d...@redhat.com> wrote: > Hi, > > the issue seems to be that host-1 stopped responding and I can see some > fluetd errors which we should look at. > > Jira opened to track this issue: https://ovirt-jira.atlassian. > net/browse/OVIRT-2363 > > Martin, I also added you to the Jira - can you please have a look? > > error from node-1 messages log: > Jul 23 05:09:14 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:14 -0400 [warn]: detached forwarding server > 'lago-basic-suite-master-engine' > host="lago-basic-suite-master-engine" port=24224 phi=16.275347714068506 > Jul 23 05:09:14 lago-basic-suite-master-host-1 fluentd: > ["lago-basic-suite-master-engine", "lago-basic-suite-master-engine", > "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", > "lago-basic-suite-master-engine", "lago-basic-suite-master-engine"] > Jul 23 05:09:14 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:14 -0400 fluent.warn: {"host":"lago-basic-suite- > master-engine","port":24224,"phi":16.275347714068506,"message":"detached > forwarding server 'lago-basic-suite-master-engine' > host=\"lago-basic-suite-master-engine\" port=24224 > phi=16.275347714068506"} > Jul 23 05:09:15 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:15 -0400 [warn]: detached forwarding server > 'lago-basic-suite-master-engine' > host="lago-basic-suite-master-engine" port=24224 phi=16.70444149784817 > Jul 23 05:09:15 lago-basic-suite-master-host-1 fluentd: > ["lago-basic-suite-master-engine", "lago-basic-suite-master-engine", > "lago-basic-suite-master-engine", "lago-basic-suite-master-engine", > "lago-basic-suite-master-engine", "lago-basic-suite-master-engine"] > Jul 23 05:09:15 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:15 -0400 fluent.warn: {"host":"lago-basic-suite- > master-engine","port":24224,"phi":16.70444149784817,"message":"detached > forwarding server 'lago-basic-suite-master-engine' > host=\"lago-basic-suite-master-engine\" port=24224 phi=16.70444149784817"} > Jul 23 05:09:23 lago-basic-suite-master-host-1 python: ansible-command > Invoked with warn=False executable=None _uses_shell=False > _raw_params=systemctl is-active 'collectd' removes=None argv=None > creates=None chdir=None stdin=None > Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd-logind: New session > 29 of user root. > Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd: Started Session 29 > of user root. > Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd: Starting Session > 29 of user root. > Jul 23 05:09:25 lago-basic-suite-master-host-1 systemd-logind: Removed > session 29. > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: failed to flush the buffer. > error_class="RuntimeError" error="no nodes are available" > plugin_id="object:151a620" > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: retry count exceededs limit. > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd- > 0.12.42/lib/fluent/plugin/out_forward.rb:222:in `write_objects' > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd- > 0.12.42/lib/fluent/output.rb:490:in `write' > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd- > 0.12.42/lib/fluent/buffer.rb:354:in `write_chunk' > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd- > 0.12.42/lib/fluent/buffer.rb:333:in `pop' > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd- > 0.12.42/lib/fluent/output.rb:342:in `try_flush' > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [warn]: /usr/share/gems/gems/fluentd- > 0.12.42/lib/fluent/output.rb:149:in `run' > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 [error]: throwing away old logs. > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 fluent.warn: {"error_class":"RuntimeError","error":"no > nodes are available","plugin_id":"object:151a620","message":"failed to > flush the buffer. error_class=\"RuntimeError\" error=\"no nodes are > available\" plugin_id=\"object:151a620\""} > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 fluent.warn: {"message":"retry count exceededs limit."} > Jul 23 05:09:27 lago-basic-suite-master-host-1 fluentd: 2018-07-23 > 05:09:27 -0400 fluent.error: {"message":"throwing away old logs."} > > > > Thanks. > Dafna >
Hi, I can see in vdsm.log that it received a kill signal: 2018-07-23 05:24:26,735-0400 INFO (MainThread) [vds] Received signal 15, shutting down (vdsmd:68) And in /var/log/messages I found that mom was killed: Jul 23 05:24:16 lago-basic-suite-master-host-1 systemd: Stopping MOM instance configured for VDSM purposes... ... Jul 23 05:24:26 lago-basic-suite-master-host-1 systemd: mom-vdsm.service stop-sigterm timed out. Killing. Jul 23 05:24:26 lago-basic-suite-master-host-1 systemd: mom-vdsm.service: main process exited, code=killed, status=9/KILL Jul 23 05:24:26 lago-basic-suite-master-host-1 systemd: Stopped MOM instance configured for VDSM purposes. Jul 23 05:24:26 lago-basic-suite-master-host-1 systemd: Unit mom-vdsm.service entered failed state. Jul 23 05:24:26 lago-basic-suite-master-host-1 systemd: mom-vdsm.service failed. So Didi/Shirly/Martin can fluentd error be related to mom shutdown? And could this be a cause of VDSM shutdown? > > > On Mon, Jul 23, 2018 at 10:31 AM, oVirt Jenkins <jenk...@ovirt.org> wrote: > >> Change 92882,9 (ovirt-engine) is probably the reason behind recent system >> test >> failures in the "ovirt-master" change queue and needs to be fixed. >> >> This change had been removed from the testing queue. Artifacts build from >> this >> change will not be released until it is fixed. >> >> For further details about the change see: >> https://gerrit.ovirt.org/#/c/92882/9 >> >> For failed test results see: >> http://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/8764/ >> _______________________________________________ >> Infra mailing list -- in...@ovirt.org >> To unsubscribe send an email to infra-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/site/privacy-policy/ >> oVirt Code of Conduct: https://www.ovirt.org/communit >> y/about/community-guidelines/ >> List Archives: https://lists.ovirt.org/archiv >> es/list/in...@ovirt.org/message/6LYYXSGM4LQSRVSYY3IJEIE64LW27TJM/ >> > > -- Martin Perina Associate Manager, Software Engineering Red Hat Czech s.r.o.
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/KXBI2VR5TXH2FRBOS3ASV3YPOTJZ52RB/