On Mon, Jan 18, 2021 at 10:53 AM Martin Perina <[email protected]> wrote: > > > > On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David <[email protected]> wrote: >> >> On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David <[email protected]> wrote: >> > >> > On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David <[email protected]> wrote: >> > > >> > > On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David <[email protected]> >> > > wrote: >> > > > >> > > > On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David <[email protected]> >> > > > wrote: >> > > > > >> > > > > On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David <[email protected]> >> > > > > wrote: >> > > > > > >> > > > > > On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk >> > > > > > <[email protected]> wrote: >> > > > > > > >> > > > > > > Hi, >> > > > > > > >> > > > > > > my guess is it's selinux-related. >> > > > > > > >> > > > > > > Unfortunately I can't find any meaningful errors in audit.log in >> > > > > > > a >> > > > > > > scenario where host deployment fails. >> > > > > > > However switching selinux to permissive mode before adding hosts >> > > > > > > makes >> > > > > > > the problem go away, so it's probably not an error somewhere in >> > > > > > > logic. >> > > > > > >> > > > > > It's getting weirder: Under strace, it succeeds: >> > > > > > >> > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948 >> > > > > > >> > > > > > (Can't see the actual log, as I didn't add '-A', so it was >> > > > > > overwritten >> > > > > > on restart...) >> > > > > >> > > > > After updating it to use '-A' it indeed shows that it worked: >> > > > > >> > > > > 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK >> > > > > <unfinished ...> >> > > > > 43664 14:16:55.997695 <... access resumed>) = 0 >> > > > > >> > > > > Weird. >> > > > > >> > > > > Now ran in parallel 'ci test' for this patch and another one from >> > > > > master, for comparison: >> > > > >> > > > Again, the same: >> > > > >> > > > > >> > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/ >> > > > >> > > > With strace, passed, >> > > > >> > > > > >> > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/ >> > > > >> > > > Without strace, failed. >> > > > >> > > > Last nightly run that passed [1] used: >> > > > >> > > > ost-images-el8-host-installed-1-202101100446.x86_64 >> > > > ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64 >> > > > >> > > > Trying now with these - not sure it possible to put specific versions >> > > > inside >> > > > automation/*packages, let's see: >> > > > >> > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977 >> > > >> > > Indeed, with a fixed ost-images and removing updates, it passes. network >> > > suite >> > > failed, but he-basic passed: >> > > >> > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html >> > > >> > > So I am quite certain this is an OS issue. Not sure how we do not see >> > > this in basic-suite. >> > > Perhaps it's related to nested-kvm, or to load/slowness caused by that? >> > > Weird. >> > > >> > > when this fails, we do not collect all engine's /var/log, only >> > > messages and ovirt-engine/ . >> > > So it's not easy to get a list of the packages that were updated. >> > > >> > > Pushed now: >> > > >> > > https://github.com/oVirt/ovirt-ansible-collection/pull/202 >> > > >> > > to get all of engine's /var/log, and ran manual HE job with it: >> > > >> > > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/ >> > >> > This one I accidentally ran with the wrong repo, then ran another one >> > with the correct repo [1], >> > But: >> > >> > 1. The repo wasn't used. Emailed about this a separate thread: "manual >> > job does not use custom repo" >> > >> > 2. It passed! Being what seems like a heisenbug, I understand why when >> > you run it under strace it >> > works differently. But even if you just intend to collect more logs it >> > also causes it to behave >> > differently? :-) This does not mean that "problem solved" - latest >> > nightly run [2] did fail with >> > the same error. >> >> Status: >> >> 1. he-basic-suite is still failing. >> >> 2. Patch to collect all of /var/log from the engine merged. >> >> Dana, can you please update? Did you have any progress? >> >> IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue :-). >> So, how do we continue? > > > Switching to CentOS Stream development/testing is a big effort, I'm not sure > we can do this and still deliver all the RFEs/bugs planned for 4.4.5 ...
IMO we should now revert appliance and node to CentOS 8.3, and then continue the discussion. Having he-basic-suite broken for a week is too much. >> >> >> > >> > [1] >> > https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7681/ >> > [2] >> > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1887/ >> > >> > > >> > > >> > > > >> > > > [1] >> > > > https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1879/ >> > > -- >> > > Didi >> > >> > >> > >> > -- >> > Didi >> >> >> >> -- >> Didi >> > > > -- > Martin Perina > Manager, Software Engineering > Red Hat Czech s.r.o. -- Didi _______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/NVZINXBTO4UM6QGDJXRBTADDT5KT44GH/
