About [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707, I have found
that when executing the http request of the playbook, sometimes in the
response there are some events that are missing. The way we handle the
events is via while loop, where in each loop we continue from the
lastEventId. Since not all events were listed, the lastEventId (which is
returned as total number of events) is smaller than the actual eventId that
was handled. Thus, in the next iteration the last events are handled again.
It seems that this comes from Ansible Runner, and I'm investigating that
area now
I'm not sure if it's the same issue as you get with the logs, though, as
I'm not familiar with this area in the OSTs. Is the creation of the 2 logs
done by different threads?
Regarding the initial error (' directory is not empty.non-zero return
code') I'm not sure, will have to loop more into it. Martin, do you have
any idea?
On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <[email protected]> wrote:
> On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <[email protected]>
> wrote:
> >
> > Hi all,
> >
> > This started a few days ago [1] and randomly happens since then:
> >
> > E DEBUG: Configuration:
> > E DEBUG: command: collect
> > E DEBUG: Traceback (most recent call last):
> > E DEBUG: File
> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> > line 2067, in <module>
> > E DEBUG: '%s directory is not empty.' %
> (conf["local_tmp_dir"])
> > E DEBUG: Exception: /dev/shm/log directory is not
> > empty.ERROR: /dev/shm/log directory is not empty.non-zero return code
> >
> > Michal tried to fix this by using a random directory but it still fails
> [2]:
> >
> > DEBUG: command: collect
> > DEBUG: Traceback (most recent call last):
> > DEBUG: File
> "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
> > line 2067, in <module>
> > DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"])
> > DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR:
> > /dev/shm/kaN7uY directory is not empty.non-zero return code
> >
> > Since I suppose that the randomness of mktemp is good enough, it must
> > be something else. Also, the last successful run before [1] used the
> > same OST git commit (same code), so I do not think it's something in
> > OST's code.
> >
> > Any idea?
> >
> > I think I'll push a patch to create and use the directory right before
> > calling ovirt-log-collector, which is probably better in other ways.
>
> My patch [1] still fails, with a somewhat different error message, but
> this made me check further, and while I still do not understand, I have
> this to add:
>
> In the failing runs, ovirt-log-collector is called *twice* in parallel.
> E.g.
> in [2] (the check-patch of [1]):
>
> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]:
> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]:
> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
>
> It also generates two logs, which you can check/compare.
>
> It's the same for previous ones, e.g. latest nightly [3][4]:
>
> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]:
> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For
> complete SELinux messages run: sealert -l
> d03a8655-9430-4fcf-9892-3b4df1939899
> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
> the sock_file
> ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012*****
> Plugin catchall (100. confidence) suggests
> **************************#012#012If you believe that systemd should
> be allowed unlink access on the
> ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by
> default.#012Then you should report this as a bug.#012You can generate
> a local policy module to allow this access.#012Do#012allow this access
> for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow
> -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012
> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]:
> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
> _uses_shell=True warn=True stdin_add_newline=True
> strip_empty_ends=True argv=None chdir=None executable=None
> creates=None removes=None stdin=None
>
> Any idea what might have caused this to start happening? Perhaps
> a bug in ansible, or ansible-runner? It reminds me of [5].
> Adding Dana and Martin.
>
> I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker.
>
> Best regards,
>
> [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875
>
> [2]
> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/15980/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/lago-basic-suite-master-engine/var/log/messages/*view*
>
> [3]
> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/
>
> [4]
> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/artifact/exported-artifacts/test_logs/lago-basic-suite-master-engine/var/log/messages/*view*
>
> [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707
>
> >
> > Best regards,
> >
> > [1]
> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/949/
> >
> > [2]
> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/
> >
> >
> > --
> > Didi
>
>
>
> --
> Didi
>
>
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/[email protected]/message/FNLNHLUADVEV5KND47SNMMHQTMPAY5FM/