On Mon, Jan 18, 2021 at 11:19 AM Marcin Sobczyk <[email protected]> wrote:
>
>
>
> On 1/18/21 9:58 AM, Yedidyah Bar David wrote:
> > On Mon, Jan 18, 2021 at 10:53 AM Martin Perina <[email protected]> wrote:
> >>
> >>
> >> On Mon, Jan 18, 2021 at 9:08 AM Yedidyah Bar David <[email protected]> wrote:
> >>> On Sun, Jan 17, 2021 at 3:11 PM Yedidyah Bar David <[email protected]> 
> >>> wrote:
> >>>> On Thu, Jan 14, 2021 at 1:41 PM Yedidyah Bar David <[email protected]> 
> >>>> wrote:
> >>>>> On Thu, Jan 14, 2021 at 8:35 AM Yedidyah Bar David <[email protected]> 
> >>>>> wrote:
> >>>>>> On Wed, Jan 13, 2021 at 5:34 PM Yedidyah Bar David <[email protected]> 
> >>>>>> wrote:
> >>>>>>> On Wed, Jan 13, 2021 at 2:48 PM Yedidyah Bar David <[email protected]> 
> >>>>>>> wrote:
> >>>>>>>> On Wed, Jan 13, 2021 at 1:57 PM Marcin Sobczyk <[email protected]> 
> >>>>>>>> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> my guess is it's selinux-related.
> >>>>>>>>>
> >>>>>>>>> Unfortunately I can't find any meaningful errors in audit.log in a
> >>>>>>>>> scenario where host deployment fails.
> >>>>>>>>> However switching selinux to permissive mode before adding hosts 
> >>>>>>>>> makes
> >>>>>>>>> the problem go away, so it's probably not an error somewhere in 
> >>>>>>>>> logic.
> >>>>>>>> It's getting weirder: Under strace, it succeeds:
> >>>>>>>>
> >>>>>>>> https://gerrit.ovirt.org/c/ovirt-system-tests/+/112948
> >>>>>>>>
> >>>>>>>> (Can't see the actual log, as I didn't add '-A', so it was 
> >>>>>>>> overwritten
> >>>>>>>> on restart...)
> >>>>>>> After updating it to use '-A' it indeed shows that it worked:
> >>>>>>>
> >>>>>>> 43664 14:16:55.997639 access("/etc/pki/ovirt-engine/requests", W_OK
> >>>>>>> <unfinished ...>
> >>>>>>> 43664 14:16:55.997695 <... access resumed>) = 0
> >>>>>>>
> >>>>>>> Weird.
> >>>>>>>
> >>>>>>> Now ran in parallel 'ci test' for this patch and another one from
> >>>>>>> master, for comparison:
> >>>>>> Again, the same:
> >>>>>>
> >>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14916/
> >>>>>> With strace, passed,
> >>>>>>
> >>>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_he-basic-suite-master/1883/
> >>>>>> Without strace, failed.
> >>>>>>
> >>>>>> Last nightly run that passed [1] used:
> >>>>>>
> >>>>>> ost-images-el8-host-installed-1-202101100446.x86_64
> >>>>>> ovirt-engine-appliance-4.4-20210109182828.1.el8.x86_64
> >>>>>>
> >>>>>> Trying now with these - not sure it possible to put specific versions 
> >>>>>> inside
> >>>>>> automation/*packages, let's see:
> >>>>>>
> >>>>>> https://gerrit.ovirt.org/c/ovirt-system-tests/+/112977
> >>>>> Indeed, with a fixed ost-images and removing updates, it passes. 
> >>>>> network suite
> >>>>> failed, but he-basic passed:
> >>>>>
> >>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/14920/artifact/ci_build_summary.html
> >>>>>
> >>>>> So I am quite certain this is an OS issue. Not sure how we do not see
> >>>>> this in basic-suite.
> >>>>> Perhaps it's related to nested-kvm, or to load/slowness caused by that? 
> >>>>> Weird.
> >>>>>
> >>>>> when this fails, we do not collect all engine's /var/log, only
> >>>>> messages and ovirt-engine/ .
> >>>>> So it's not easy to get a list of the packages that were updated.
> >>>>>
> >>>>> Pushed now:
> >>>>>
> >>>>> https://github.com/oVirt/ovirt-ansible-collection/pull/202
> >>>>>
> >>>>> to get all of engine's /var/log, and ran manual HE job with it:
> >>>>>
> >>>>> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/7680/
> >>>> This one I accidentally ran with the wrong repo, then ran another one
> >>>> with the correct repo [1],
> >>>> But:
> >>>>
> >>>> 1. The repo wasn't used. Emailed about this a separate thread: "manual
> >>>> job does not use custom repo"
> >>>>
> >>>> 2. It passed! Being what seems like a heisenbug, I understand why when
> >>>> you run it under strace it
> >>>> works differently. But even if you just intend to collect more logs it
> >>>> also causes it to behave
> >>>> differently? :-) This does not mean that "problem solved" - latest
> >>>> nightly run [2] did fail with
> >>>> the same error.
> >>> Status:
> >>>
> >>> 1. he-basic-suite is still failing.
> >>>
> >>> 2. Patch to collect all of /var/log from the engine merged.
> >>>
> >>> Dana, can you please update? Did you have any progress?
> >>>
> >>> IMO it's an OS bug. If Marcin says it's an selinux issue, I do not argue 
> >>> :-).
> >>> So, how do we continue?
> >>
> >> Switching to CentOS Stream development/testing is a big effort, I'm not 
> >> sure we can do this and still deliver all the RFEs/bugs planned for 4.4.5 
> >> ...
> +1
> > IMO we should now revert appliance and node to CentOS 8.3, and then
> > continue the discussion.
> > Having he-basic-suite broken for a week is too much.
> +1 The testing infrastructure for Stream is here, but if it doesn't work
> yet than let's stick to the plan and focus on 8.3.

Just to conclude the original issue - a workaround found, root cause still
under investigation. Commented on the bugs (oVirt and Stream) with details.

Best regards,
-- 
Didi
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/7HU6SONUCBHPFZR5DB74TDD6OBINZNHE/

Reply via email to