On Tue, Dec 17, 2019 at 9:55 AM Anton Marchukov <amarc...@redhat.com> wrote:
> Hi. > > We do watch those and this one was reported by Dafna though devel list was > not included for some reason (usually we do include it). We strive to > follow up on it daily, but sometimes we lag behind. > > It would be good to send to the patch owner that the system is identified > as being a possible cause (by bi-section), but initially it was not done > like that. Sometimes reporting is misleading (e.g. the external repos > changes that we use are not visible for bi-section, also infra problems is > something we fix). Though I am ok to try to CC the patch owner in > especially since we are working on gating as a long-term solution and IMO > this is a step in the right direction. > Please let's try that, those alerts need to be more raised > > Anton. > > > On 17 Dec 2019, at 09:43, Yedidyah Bar David <d...@redhat.com> wrote: > > > > On Tue, Dec 17, 2019 at 10:11 AM Yedidyah Bar David <d...@redhat.com> > wrote: > >> > >> Hi all, > >> > >> $subject. [1] has > >> ovirt-engine-4.4.0-0.0.master.20191204120550.git04d5d05.el7.noarch . > >> > >> Tried to look around, and I have a few notes/questions: > >> > >> 1. Last successful run of [2] is 3 days old, but apparently it wasn't > >> published. Any idea why? > >> > >> 2. Failed runs of [2] are reported to infra, with emails such as: > >> > >> [CQ]: 105472, 5 (ovirt-engine) failed "ovirt-master" system tests, but > >> isn't the failure root cause > >> > >> Is anyone monitoring these? > >> > >> Is this the only alerting that CI generates on such failures? > >> > >> If first is No and second is Yes, then we need someone/something to > >> start monitoring. This was discussed a lot, but I do not see any > >> change. Ideally, such alerts should be To'ed or Cc'ed to the author > >> and reviewers of the patch that CI found to be guilty (which might be > >> wrong, that's not the point). Do we plan to have something like this? > >> Any idea when it will be ready? > >> > >> 3. I looked at a few recent failures of [2], specifically [3][4]. Both > >> seem to have been killed after a timeout, while running > >> 'engine-config'. For [3] that's clear, see [5]: > >> > >> 2019-12-16 > 17:11:44,766::log_utils.py::__exit__::611::lago.ssh::DEBUG::end > >> task:fb6611dc-55bb-4251-aeda-2578b2ec83a2:Get ssh client for > >> lago-basic-suite-master-engine: > >> 2019-12-16 17:11:44,931::ssh.py::ssh::58::lago.ssh::DEBUG::Running > >> 22e2b6b6 on lago-basic-suite-master-engine: engine-config --set > >> VdsmUseNmstate=true > >> 2019-12-16 19:55:21,965::cmd.py::exit_handler::921::cli::DEBUG::signal > >> 15 was caught > >> > >> Can't find stdout/stderr of engine-config, so it's hard to tell if it > >> outputted anything helpful to understand why it was stuck. > >> > >> It's hard to tell that about [4], because it has very few artifacts > >> collected, no idea why, notably no lago.log, but [6] does show: > >> > >> [36m # initialize_engine: [32mSuccess [0m (in 0:04:00) [0m > >> [36m # engine_config: [0m [0m [0m > >> [36m * Collect artifacts: [0m [0m [0m > >> [36m - [Thread-34] lago-basic-suite-master-engine: > >> [31mERROR [0m (in 0:00:04) [0m > >> [36m * Collect artifacts: [31mERROR [0m (in 0:00:04) [0m > >> [36m # engine_config: [31mERROR [0m (in 2:42:57) [0m > >> /bin/bash: line 31: 5225 Killed > >> ${_STDCI_TIMEOUT_CMD} "3h" "$script_path" < /dev/null > >> > >> If I run 'engine-config --set VdsmUseNmstate=true' on my > >> 20191204120550.git04d5d05 engine, it returns quickly. > >> > >> Tried also adding a repo pointing at last successful run of [7], which > >> is currently [8], and it prompts me to input a version, probably as a > >> result of [9]. Ales/Martin, can you please have a look? Thanks. > > > > Something like this might be enough, please take over: > > > > https://gerrit.ovirt.org/105784 > > > > But the main point of my mail was the first points. > > > >> > >> [1] > https://resources.ovirt.org/pub/ovirt-master-snapshot/rpm/el7/noarch/ > >> [2] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/ > >> [3] > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/ > >> [4] > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/ > >> [5] > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/artifact/basic-suite.el7.x86_64/lago_logs/lago.log > >> [6] > https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/artifact/basic-suite.el7.x86_64/mock_logs/script/stdout_stderr.log > >> [7] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/ > >> [8] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/384/ > >> [9] https://gerrit.ovirt.org/105440 > >> -- > >> Didi > > > > > > > > -- > > Didi > > _______________________________________________ > > Infra mailing list -- in...@ovirt.org > > To unsubscribe send an email to infra-le...@ovirt.org > > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > > List Archives: > https://lists.ovirt.org/archives/list/in...@ovirt.org/message/4TEQYJOB67NCPO7MNV2JEKDXRV5KZTVU/ > > -- > Anton Marchukov > Associate Manager - RHV DevOps - Red Hat > > > > > > > -- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
_______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/72GNIQ2MF5ZPXRE4ERRU2EW7KQNEGY2R/