On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David <[email protected]> wrote:

> On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David <[email protected]>
> wrote:
> >
> > On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David <[email protected]>
> wrote:
> > >
> > > On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David <[email protected]>
> wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David <[email protected]>
> wrote:
> > > > >
> > > > > On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David <
> [email protected]> wrote:
> > > > > >
> > > > > > On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk <
> [email protected]> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > my guess is it's selinux-related.
> > > > > > >
> > > > > > > Unfortunately I can't find any meaningful errors in audit.log
> in a
> > > > > > > scenario where host deployment fails.
> > > > > > > However switching selinux to permissive mode before adding
> hosts makes
> > > > > > > the problem go away, so it's probably not an error somewhere
> in logic.
> > > > > >
> > > > > > It's getting weirder: Under strace, it succeeds:
> > > > > >
> > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948
> > > > > >
> > > > > > (Can't see the actual log, as I didn't add '-A', so it was
> overwritten
> > > > > > on restart...)
> > > > >
> > > > > After updating it to use '-A' it indeed shows that it worked:
> > > > >
> > > > > 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK
> > > > > <unfinished ...>
> > > > > 43664 14:16:55.997695 <... access resumed>) = 0
> > > > >
> > > > > Weird.
> > > > >
> > > > > Now ran in parallel 'ci test' for this patch and another one from
> > > > > master, for comparison:
> > > >
> > > > Again, the same:
> > > >
> > > > >
> > > > >
> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/
> > > >
> > > > With strace, passed,
> > > >
> > > > >
> > > > >
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/
> > > >
> > > > Without strace, failed.
> > > >
> > > > Last nightly run that passed [1] used:
> > > >
> > > > ost-images-el8-host-installed-1-202101100446.x86_64
> > > > ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64
> > > >
> > > > Trying now with these - not sure it possible to put specific
> versions inside
> > > > automation/*packages, let's see:
> > > >
> > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977
> > >
> > > Indeed, with a fixed ost-images and removing updates, it passes.
> network suite
> > > failed, but he-basic passed:
> > >
> > >
> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html
> > >
> > > So I am quite certain this is an OS issue. Not sure how we do not see
> > > this in basic-suite.
> > > Perhaps it's related to nested-kvm, or to load/slowness caused by
> that? Weird.
> > >
> > > when this fails, we do not collect all engine's /var/log, only
> > > messages and ovirt-engine/ .
> > > So it's not easy to get a list of the packages that were updated.
> > >
> > > Pushed now:
> > >
> > > https://github.com/oVirt/ovirt-ansible-collection/pull/202
> > >
> > > to get all of engine's /var/log, and ran manual HE job with it:
> > >
> > >
> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/
> >
> > This one I accidentally ran with the wrong repo, then ran another one
> > with the correct repo [1],
> > But:
> >
> > 1. The repo wasn't used. Emailed about this a separate thread: "manual
> > job does not use custom repo"
> >
> > 2. It passed! Being what seems like a heisenbug, I understand why when
> > you run it under strace it
> > works differently. But even if you just intend to collect more logs it
> > also causes it to behave
> > differently? :-) This does not mean that "problem solved" - latest
> > nightly run [2] did fail with
> > the same error.
>
> Status:
>
> 1. he-basic-suite is still failing.
>
> 2. Patch to collect all of /var/log from the engine merged.
>
> Dana, can you please update? Did you have any progress?
>
> IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue
> :-).
> So, how do we continue?
>

Switching to CentOS Stream development/testing is a big effort, I'm not
sure we can do this and still deliver all the RFEs/bugs planned for 4.4.5
...

>
> >
> > [1]
> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7681/
> > [2]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/
> >
> > >
> > >
> > > >
> > > > [1]
> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/
> > > --
> > > Didi
> >
> >
> >
> > --
> > Didi
>
>
>
> --
> Didi
>
>

-- 
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/23IAQDILAPJY5IFGPYHPWIVII5XOQYFI/

Reply via email to