On Wed, Mar 17, 2021 at 3:24 PM Michal Skrivanek < [email protected]> wrote:
> > > On 17. 3. 2021, at 13:53, Dana Elfassy <[email protected]> wrote: > > Adding +Marcin Sobczyk <[email protected]> > > On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <[email protected]> > wrote: > >> On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <[email protected]> >> wrote: >> > >> > Hi all, >> > >> > This started a few days ago [1] and randomly happens since then: >> > >> > E DEBUG: Configuration: >> > E DEBUG: command: collect >> > E DEBUG: Traceback (most recent call last): >> > E DEBUG: File >> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py", >> > line 2067, in <module> >> > E DEBUG: '%s directory is not empty.' % >> (conf["local_tmp_dir"]) >> > E DEBUG: Exception: /dev/shm/log directory is not >> > empty.ERROR: /dev/shm/log directory is not empty.non-zero return code >> > >> > Michal tried to fix this by using a random directory but it still fails >> [2]: >> > >> > DEBUG: command: collect >> > DEBUG: Traceback (most recent call last): >> > DEBUG: File >> "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py", >> > line 2067, in <module> >> > DEBUG: '%s directory is not empty.' % (conf["local_tmp_dir"]) >> > DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR: >> > /dev/shm/kaN7uY directory is not empty.non-zero return code >> > >> > Since I suppose that the randomness of mktemp is good enough, it must >> > be something else. Also, the last successful run before [1] used the >> > same OST git commit (same code), so I do not think it's something in >> > OST's code. >> > >> > Any idea? >> > >> > I think I'll push a patch to create and use the directory right before >> > calling ovirt-log-collector, which is probably better in other ways. >> >> My patch [1] still fails, with a somewhat different error message, but >> this made me check further, and while I still do not understand, I have >> this to add: >> >> In the failing runs, ovirt-log-collector is called *twice* in parallel. >> E.g. >> in [2] (the check-patch of [1]): >> >> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]: >> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p >> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors >> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf >> _uses_shell=True warn=True stdin_add_newline=True >> strip_empty_ends=True argv=None chdir=None executable=None >> creates=None removes=None stdin=None >> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]: >> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p >> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors >> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf >> _uses_shell=True warn=True stdin_add_newline=True >> strip_empty_ends=True argv=None chdir=None executable=None >> creates=None removes=None stdin=None >> >> It also generates two logs, which you can check/compare. >> >> It's the same for previous ones, e.g. latest nightly [3][4]: >> >> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]: >> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose >> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf >> _uses_shell=True warn=True stdin_add_newline=True >> strip_empty_ends=True argv=None chdir=None executable=None >> creates=None removes=None stdin=None >> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]: >> SELinux is preventing /usr/lib/systemd/systemd from unlink access on >> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For >> complete SELinux messages run: sealert -l >> d03a8655-9430-4fcf-9892-3b4df1939899 >> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]: >> SELinux is preventing /usr/lib/systemd/systemd from unlink access on >> the sock_file >> ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012***** >> Plugin catchall (100. confidence) suggests >> **************************#012#012If you believe that systemd should >> be allowed unlink access on the >> ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by >> default.#012Then you should report this as a bug.#012You can generate >> a local policy module to allow this access.#012Do#012allow this access >> for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow >> -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012 >> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]: >> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose >> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf >> _uses_shell=True warn=True stdin_add_newline=True >> strip_empty_ends=True argv=None chdir=None executable=None >> creates=None removes=None stdin=None >> >> Any idea what might have caused this to start happening? Perhaps >> a bug in ansible, or ansible-runner? It reminds me of [5]. >> Adding Dana and Martin. >> >> I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker. >> > > it’s from January and tehre are no comments there. Dana, any update? > It does look serious but perhaps not really hit in real world scenarios? > It's reproducible outside OST, Dana is investigating intensively last 2 weeks, but so far we haven't been able to find where events are multiplied. But AFAIK this issue is visble only with ansible-runner-service, respectively only in the part when engine fetches events from ARS and create audit_log records from them. It doesn't mean, that playbook tasks are executed multiple times! OST is using ansible-runner directly, so it doesn't seem to me like the same issue So [5] is not a blocker, it's just really annoying if you need to works with audit_log records > >> Best regards, >> >> [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875 >> >> [2] >> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/15980/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/lago-basic-suite-master-engine/var/log/messages/*view* >> >> [3] >> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/ >> >> [4] >> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/artifact/exported-artifacts/test_logs/lago-basic-suite-master-engine/var/log/messages/*view* >> >> [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707 >> >> > >> > Best regards, >> > >> > [1] >> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/949/ >> > >> > [2] >> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/ >> > >> > >> > -- >> > Didi >> >> >> >> -- >> Didi >> >> _______________________________________________ > Infra mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/PSO5IRX73FSH2RA5VCDNSPIFZRDDAWCQ/ > > > _______________________________________________ > Devel mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/6DZRAMSAJCXNSMZYCSAROOWDGKVF6GQX/ > -- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/INHZRRNHQDHCOH2EIYJ7RVM3XI4WPBBL/
