On Tue, Mar 16, 2021 at 12:50 PM Dana Elfassy <[email protected]> wrote:
>
> About [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707, I have found 
> that when executing the http request of the playbook, sometimes in the 
> response there are some events that are missing. The way we handle the events 
> is via while loop, where in each loop we continue from the lastEventId. Since 
> not all events were listed, the lastEventId (which is returned as total 
> number of events) is smaller than the actual eventId that was handled. Thus, 
> in the next iteration the last events are handled again. It seems that this 
> comes from Ansible Runner, and I'm investigating that area now

Thanks

> I'm not sure if it's the same issue as you get with the logs, though, as I'm 
> not familiar with this area in the OSTs. Is the creation of the 2 logs done 
> by different threads?

I have no idea either.

>
> Regarding the initial error (' directory is not empty.non-zero return code') 
> I'm not sure, will have to loop more into it. Martin, do you have any idea?

I think it's quite obvious that the reason for this is that we run two
ovirt-log-collector processes in parallel. Why we (sometimes) do this
is an open question, but unrelated to ovirt-log-collector (or the
specific error message it emits).

>
> On Mon, Mar 15, 2021 at 9:59 AM Yedidyah Bar David <[email protected]> wrote:
>>
>> On Mon, Mar 15, 2021 at 7:55 AM Yedidyah Bar David <[email protected]> wrote:
>> >
>> > Hi all,
>> >
>> > This started a few days ago [1] and randomly happens since then:
>> >
>> > E           DEBUG: Configuration:
>> > E           DEBUG: command: collect
>> > E           DEBUG: Traceback (most recent call last):
>> > E           DEBUG:   File
>> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
>> > line 2067, in <module>
>> > E           DEBUG:     '%s directory is not empty.' % 
>> > (conf["local_tmp_dir"])
>> > E           DEBUG: Exception: /dev/shm/log directory is not
>> > empty.ERROR: /dev/shm/log directory is not empty.non-zero return code
>> >
>> > Michal tried to fix this by using a random directory but it still fails 
>> > [2]:
>> >
>> > DEBUG: command: collect
>> > DEBUG: Traceback (most recent call last):
>> > DEBUG:   File 
>> > "/usr/lib/python3.6/site-packages/ovirt_log_collector/__main__.py",
>> > line 2067, in <module>
>> > DEBUG:     '%s directory is not empty.' % (conf["local_tmp_dir"])
>> > DEBUG: Exception: /dev/shm/kaN7uY directory is not empty.ERROR:
>> > /dev/shm/kaN7uY directory is not empty.non-zero return code
>> >
>> > Since I suppose that the randomness of mktemp is good enough, it must
>> > be something else. Also, the last successful run before [1] used the
>> > same OST git commit (same code), so I do not think it's something in
>> > OST's code.
>> >
>> > Any idea?
>> >
>> > I think I'll push a patch to create and use the directory right before
>> > calling ovirt-log-collector, which is probably better in other ways.
>>
>> My patch [1] still fails, with a somewhat different error message, but
>> this made me check further, and while I still do not understand, I have
>> this to add:
>>
>> In the failing runs, ovirt-log-collector is called *twice* in parallel. E.g.
>> in [2] (the check-patch of [1]):
>>
>> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59099]:
>> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
>> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
>> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
>> _uses_shell=True warn=True stdin_add_newline=True
>> strip_empty_ends=True argv=None chdir=None executable=None
>> creates=None removes=None stdin=None
>> Mar 15 07:38:59 lago-basic-suite-master-engine platform-python[59124]:
>> ansible-command Invoked with _raw_params=lctmp=$(mktemp -d -p
>> /dev/shm); ovirt-log-collector --verbose --batch --no-hypervisors
>> --local-tmp="${lctmp}" --conf-file=/root/ovirt-log-collector.conf
>> _uses_shell=True warn=True stdin_add_newline=True
>> strip_empty_ends=True argv=None chdir=None executable=None
>> creates=None removes=None stdin=None
>>
>> It also generates two logs, which you can check/compare.
>>
>> It's the same for previous ones, e.g. latest nightly [3][4]:
>>
>> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59343]:
>> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
>> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
>> _uses_shell=True warn=True stdin_add_newline=True
>> strip_empty_ends=True argv=None chdir=None executable=None
>> creates=None removes=None stdin=None
>> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
>> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
>> the sock_file ansible-ssh-lago-basic-suite-master-host-1-22-root. For
>> complete SELinux messages run: sealert -l
>> d03a8655-9430-4fcf-9892-3b4df1939899
>> Mar 15 06:23:30 lago-basic-suite-master-engine setroubleshoot[58889]:
>> SELinux is preventing /usr/lib/systemd/systemd from unlink access on
>> the sock_file 
>> ansible-ssh-lago-basic-suite-master-host-1-22-root.#012#012*****
>>  Plugin catchall (100. confidence) suggests
>> **************************#012#012If you believe that systemd should
>> be allowed unlink access on the
>> ansible-ssh-lago-basic-suite-master-host-1-22-root sock_file by
>> default.#012Then you should report this as a bug.#012You can generate
>> a local policy module to allow this access.#012Do#012allow this access
>> for now by executing:#012# ausearch -c 'systemd' --raw | audit2allow
>> -M my-systemd#012# semodule -X 300 -i my-systemd.pp#012
>> Mar 15 06:23:30 lago-basic-suite-master-engine platform-python[59361]:
>> ansible-command Invoked with _raw_params=ovirt-log-collector --verbose
>> --batch --no-hypervisors --conf-file=/root/ovirt-log-collector.conf
>> _uses_shell=True warn=True stdin_add_newline=True
>> strip_empty_ends=True argv=None chdir=None executable=None
>> creates=None removes=None stdin=None
>>
>> Any idea what might have caused this to start happening? Perhaps
>> a bug in ansible, or ansible-runner? It reminds me of [5].
>> Adding Dana and Martin.
>>
>> I think [5] is quite a serious bug, btw, should be a 4.4.5 blocker.
>>
>> Best regards,
>>
>> [1] https://gerrit.ovirt.org/c/ovirt-system-tests/+/113875
>>
>> [2] 
>> https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/15980/artifact/check-patch.basic_suite_master.el8.x86_64/test_logs/lago-basic-suite-master-engine/var/log/messages/*view*
>>
>> [3] 
>> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/
>>
>> [4] 
>> https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/artifact/exported-artifacts/test_logs/lago-basic-suite-master-engine/var/log/messages/*view*
>>
>> [5] https://bugzilla.redhat.com/show_bug.cgi?id=1917707
>>
>> >
>> > Best regards,
>> >
>> > [1] 
>> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/949/
>> >
>> > [2] 
>> > https://jenkins.ovirt.org/job/ovirt-system-tests_basic-suite-master_nightly/959/
>> >
>> >
>> > --
>> > Didi
>>
>>
>>
>> --
>> Didi
>>


-- 
Didi
_______________________________________________
Infra mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/ZYZWSIDOJMSPSVSRTPU5JZJXX6XLCQ4Q/

Reply via email to