On Tue, Dec 17, 2019 at 9:55 AM Anton Marchukov <amarc...@redhat.com> wrote:

> Hi.
>
> We do watch those and this one was reported by Dafna though devel list was
> not included for some reason (usually we do include it). We strive to
> follow up on it daily, but sometimes we lag behind.
>
> It would be good to send to the patch owner that the system is identified
> as being a possible cause (by bi-section), but initially it was not done
> like that. Sometimes reporting is misleading (e.g. the external repos
> changes that we use are not visible for bi-section, also infra problems is
> something we fix). Though I am ok to try to CC the patch owner in
> especially since we are working on gating as a long-term solution and IMO
> this is a step in the right direction.
>

Please let's try that, those alerts need to be more raised

>
> Anton.
>
> > On 17 Dec 2019, at 09:43, Yedidyah Bar David <d...@redhat.com> wrote:
> >
> > On Tue, Dec 17, 2019 at 10:11 AM Yedidyah Bar David <d...@redhat.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> $subject. [1] has
> >> ovirt-engine-4.4.0-0.0.master.20191204120550.git04d5d05.el7.noarch .
> >>
> >> Tried to look around, and I have a few notes/questions:
> >>
> >> 1. Last successful run of [2] is 3 days old, but apparently it wasn't
> >> published. Any idea why?
> >>
> >> 2. Failed runs of [2] are reported to infra, with emails such as:
> >>
> >> [CQ]: 105472, 5 (ovirt-engine) failed "ovirt-master" system tests, but
> >> isn't the failure root cause
> >>
> >> Is anyone monitoring these?
> >>
> >> Is this the only alerting that CI generates on such failures?
> >>
> >> If first is No and second is Yes, then we need someone/something to
> >> start monitoring. This was discussed a lot, but I do not see any
> >> change. Ideally, such alerts should be To'ed or Cc'ed to the author
> >> and reviewers of the patch that CI found to be guilty (which might be
> >> wrong, that's not the point). Do we plan to have something like this?
> >> Any idea when it will be ready?
> >>
> >> 3. I looked at a few recent failures of [2], specifically [3][4]. Both
> >> seem to have been killed after a timeout, while running
> >> 'engine-config'. For [3] that's clear, see [5]:
> >>
> >> 2019-12-16
> 17:11:44,766::log_utils.py::__exit__::611::lago.ssh::DEBUG::end
> >> task:fb6611dc-55bb-4251-aeda-2578b2ec83a2:Get ssh client for
> >> lago-basic-suite-master-engine:
> >> 2019-12-16 17:11:44,931::ssh.py::ssh::58::lago.ssh::DEBUG::Running
> >> 22e2b6b6 on lago-basic-suite-master-engine: engine-config --set
> >> VdsmUseNmstate=true
> >> 2019-12-16 19:55:21,965::cmd.py::exit_handler::921::cli::DEBUG::signal
> >> 15 was caught
> >>
> >> Can't find stdout/stderr of engine-config, so it's hard to tell if it
> >> outputted anything helpful to understand why it was stuck.
> >>
> >> It's hard to tell that about [4], because it has very few artifacts
> >> collected, no idea why, notably no lago.log, but [6] does show:
> >>
> >> [36m  # initialize_engine:  [32mSuccess [0m (in 0:04:00) [0m
> >> [36m  # engine_config:  [0m [0m [0m
> >> [36m    * Collect artifacts:  [0m [0m [0m
> >> [36m      - [Thread-34] lago-basic-suite-master-engine:
> >> [31mERROR [0m (in 0:00:04) [0m
> >> [36m    * Collect artifacts:  [31mERROR [0m (in 0:00:04) [0m
> >> [36m  # engine_config:  [31mERROR [0m (in 2:42:57) [0m
> >> /bin/bash: line 31:  5225 Killed
> >> ${_STDCI_TIMEOUT_CMD} "3h" "$script_path" < /dev/null
> >>
> >> If I run 'engine-config --set VdsmUseNmstate=true' on my
> >> 20191204120550.git04d5d05 engine, it returns quickly.
> >>
> >> Tried also adding a repo pointing at last successful run of [7], which
> >> is currently [8], and it prompts me to input a version, probably as a
> >> result of [9]. Ales/Martin, can you please have a look? Thanks.
> >
> > Something like this might be enough, please take over:
> >
> > https://gerrit.ovirt.org/105784
> >
> > But the main point of my mail was the first points.
> >
> >>
> >> [1]
> https://resources.ovirt.org/pub/ovirt-master-snapshot/rpm/el7/noarch/
> >> [2] https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/
> >> [3]
> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/
> >> [4]
> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/
> >> [5]
> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17768/artifact/basic-suite.el7.x86_64/lago_logs/lago.log
> >> [6]
> https://jenkins.ovirt.org/job/ovirt-master_change-queue-tester/17761/artifact/basic-suite.el7.x86_64/mock_logs/script/stdout_stderr.log
> >> [7] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/
> >> [8] https://jenkins.ovirt.org/job/ovirt-engine_standard-on-merge/384/
> >> [9] https://gerrit.ovirt.org/105440
> >> --
> >> Didi
> >
> >
> >
> > --
> > Didi
> > _______________________________________________
> > Infra mailing list -- in...@ovirt.org
> > To unsubscribe send an email to infra-le...@ovirt.org
> > Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> > oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> https://lists.ovirt.org/archives/list/in...@ovirt.org/message/4TEQYJOB67NCPO7MNV2JEKDXRV5KZTVU/
>
> --
> Anton Marchukov
> Associate Manager - RHV DevOps - Red Hat
>
>
>
>
>
>
>

-- 
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/72GNIQ2MF5ZPXRE4ERRU2EW7KQNEGY2R/

Reply via email to