On Mon, Dec 17, 2018 at 8:39 AM Barak Korren <[email protected]> wrote:

>
>
> On Mon, 17 Dec 2018 at 08:32, Edward Haas <[email protected]> wrote:
>
>>
>>
>> On Sun, Dec 16, 2018 at 3:31 PM Barak Korren <[email protected]> wrote:
>>
>>>
>>>
>>> On Sun, 16 Dec 2018 at 14:44, Edward Haas <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <[email protected]> wrote:
>>>>
>>>>> On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <[email protected]> wrote:
>>>>>
>>>>>> On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We have this failure that pops randomly:
>>>>>>>>>>>
>>>>>>>>>>> 1. All tests pass
>>>>>>>>>>>
>>>>>>>>>>> *00:13:13.284* ___________________________________ summary 
>>>>>>>>>>> ____________________________________*00:13:13.285*   tests: 
>>>>>>>>>>> commands succeeded*00:13:13.286*   storage-py27: commands 
>>>>>>>>>>> succeeded*00:13:13.286*   storage-py36: commands 
>>>>>>>>>>> succeeded*00:13:13.286*   lib-py27: commands 
>>>>>>>>>>> succeeded*00:13:13.287*   lib-py36: commands 
>>>>>>>>>>> succeeded*00:13:13.288*   network-py27: commands 
>>>>>>>>>>> succeeded*00:13:13.290*   network-py36: commands 
>>>>>>>>>>> succeeded*00:13:13.291*   virt-py27: commands 
>>>>>>>>>>> succeeded*00:13:13.292*   virt-py36: commands 
>>>>>>>>>>> succeeded*00:13:13.293*   congratulations :)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2. But we fail to collect logs at the end
>>>>>>>>>>>
>>>>>>>>>>> *00:14:35.992* 
>>>>>>>>>>> ##########################################################*00:14:35.995*
>>>>>>>>>>>  ## Wed Nov 28 17:39:50 UTC 2018 Finished env: 
>>>>>>>>>>> fc28:fedora-28-x86_64*00:14:35.996* ##      took 764 
>>>>>>>>>>> seconds*00:14:35.997* ##      rc = 1*00:14:35.997* 
>>>>>>>>>>> ##########################################################*00:14:36.009*
>>>>>>>>>>>  ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* 
>>>>>>>>>>> ##! Last 20 log entries: 
>>>>>>>>>>> /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* 
>>>>>>>>>>> ##!*00:14:36.012* 
>>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013*
>>>>>>>>>>>  tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file 
>>>>>>>>>>> changed as we read it*00:14:36.014* 
>>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015*
>>>>>>>>>>>  lastlog*00:14:36.015* libvirt/*00:14:36.015* 
>>>>>>>>>>> libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* 
>>>>>>>>>>> libvirt/qemu/*00:14:36.016* 
>>>>>>>>>>> libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017*
>>>>>>>>>>>  libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* 
>>>>>>>>>>> ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* 
>>>>>>>>>>> README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* 
>>>>>>>>>>> sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took 
>>>>>>>>>>> 678 seconds*00:14:36.018* 
>>>>>>>>>>> ===================================*00:14:36.019* ##!*00:14:36.019* 
>>>>>>>>>>> ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* 
>>>>>>>>>>> ##!########################################################
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This looks like an issue with vdsm check-patch.sh:
>>>>>>>>>>>
>>>>>>>>>>> function collect_logs {
>>>>>>>>>>>     res=$?
>>>>>>>>>>>     [ "$res" -ne 0 ] && echo "*** err: $res"
>>>>>>>>>>>     cd /var/log
>>>>>>>>>>>     tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" *
>>>>>>>>>>>     cd /var/host_log
>>>>>>>>>>>     tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" *
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> trap collect_logs EXIT
>>>>>>>>>>>
>>>>>>>>>>> Seems that tar fail to collect log if the log is modified while
>>>>>>>>>>> copied, which makes sense.
>>>>>>>>>>>
>>>>>>>>>>> We can ignore errors in tar, since log collection should not
>>>>>>>>>>> fail the build, but I think
>>>>>>>>>>> a better solution is to avoid collecting any logs since vdsm
>>>>>>>>>>> writes its own logs during
>>>>>>>>>>> tests - all the info must be in vdsm log.
>>>>>>>>>>>
>>>>>>>>>>> Here is the list of collected logs:
>>>>>>>>>>>
>>>>>>>>>>> *00:13:47.280* + tar -cvzf 
>>>>>>>>>>> /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz
>>>>>>>>>>>  btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs 
>>>>>>>>>>> hawkey.log journal lastlog libvirt openvswitch README tallylog 
>>>>>>>>>>> vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* 
>>>>>>>>>>> dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* 
>>>>>>>>>>> dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* 
>>>>>>>>>>> glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* 
>>>>>>>>>>> journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* 
>>>>>>>>>>> libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* 
>>>>>>>>>>> openvswitch/ovs-vswitchd.log*00:13:47.318* 
>>>>>>>>>>> openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* 
>>>>>>>>>>> tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* 
>>>>>>>>>>> wtmp*00:13:47.343* yum.log*00:13:47.349* + cd 
>>>>>>>>>>> /var/host_log*00:13:47.350* + tar -cvzf 
>>>>>>>>>>> /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz
>>>>>>>>>>>  anaconda audit boot.log btmp chrony cloud-init.log 
>>>>>>>>>>> cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log 
>>>>>>>>>>> firewalld glusterfs hawkey.log journal lastlog libvirt 
>>>>>>>>>>> ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* 
>>>>>>>>>>> anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* 
>>>>>>>>>>> anaconda/ks-script-l5qnynnj.log*00:13:47.358* 
>>>>>>>>>>> anaconda/storage.log*00:13:47.359* 
>>>>>>>>>>> anaconda/program.log*00:13:47.395* 
>>>>>>>>>>> anaconda/ks-script-b5_08tmo.log*00:13:47.396* 
>>>>>>>>>>> anaconda/ks-script-6uks8bp3.log*00:13:47.397* 
>>>>>>>>>>> anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* 
>>>>>>>>>>> anaconda/journal.log*00:13:47.449* 
>>>>>>>>>>> anaconda/dnf.librepo.log*00:13:47.458* 
>>>>>>>>>>> anaconda/packaging.log*00:13:47.465* 
>>>>>>>>>>> anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* 
>>>>>>>>>>> anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* 
>>>>>>>>>>> audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* 
>>>>>>>>>>> audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* 
>>>>>>>>>>> audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* 
>>>>>>>>>>> btmp*00:13:48.157* chrony/*00:13:48.159* 
>>>>>>>>>>> cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* 
>>>>>>>>>>> cron*00:13:48.162* dnf.librepo.log*00:13:49.930* 
>>>>>>>>>>> dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* 
>>>>>>>>>>> firewalld*00:13:51.423* glusterfs/*00:13:51.424* 
>>>>>>>>>>> hawkey.log*00:13:51.704* journal/*00:13:51.708* 
>>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* 
>>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817*
>>>>>>>>>>>  tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file 
>>>>>>>>>>> changed as we read it*00:13:55.819* 
>>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915*
>>>>>>>>>>>  lastlog*00:13:55.923* libvirt/*00:13:55.924* 
>>>>>>>>>>> libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* 
>>>>>>>>>>> libvirt/qemu/*00:13:55.928* 
>>>>>>>>>>> libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929*
>>>>>>>>>>>  libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* 
>>>>>>>>>>> ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* 
>>>>>>>>>>> README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* 
>>>>>>>>>>> sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Most if not all are lot relevant to vdsm tests, and should not
>>>>>>>>>>> be collected.
>>>>>>>>>>>
>>>>>>>>>>> This was added in:
>>>>>>>>>>>
>>>>>>>>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9
>>>>>>>>>>> Author: Edward Haas <[email protected]>
>>>>>>>>>>> Date:   Tue Apr 17 10:53:11 2018 +0300
>>>>>>>>>>>
>>>>>>>>>>>     automation: Collect logs even when check-patch fails
>>>>>>>>>>>
>>>>>>>>>>>     Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a
>>>>>>>>>>>     Signed-off-by: Edward Haas <[email protected]>
>>>>>>>>>>>
>>>>>>>>>>> Reviewed in:
>>>>>>>>>>> https://gerrit.ovirt.org/c/90370
>>>>>>>>>>>
>>>>>>>>>>> Edward, can you explain why do we need to collect logs during
>>>>>>>>>>> check-patch,
>>>>>>>>>>> and why do we need to collect all the logs in the system?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> check-patch are running unit and integrations tests.
>>>>>>>>>> The integration tests are touching the OS and other packages
>>>>>>>>>> (like openvswitch).
>>>>>>>>>> It was added so we can debug why tests failed.
>>>>>>>>>>
>>>>>>>>>> I guess we can now separate the unit and integration tests, but
>>>>>>>>>> it will not solve
>>>>>>>>>> the problem presented here.
>>>>>>>>>> Failing to collect the logs silently sounds a good enough
>>>>>>>>>> solution to me.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Barak suggested to just exclude the journal:
>>>>>>>>> https://gerrit.ovirt.org/#/c/95850/
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is fixed now, thanks!
>>>>>>>>
>>>>>>>
>>>>>> Not fixed yet, we still fail collecting /var/host_log:
>>>>>>
>>>>>> + cd /var/host_log
>>>>>>
>>>>>> + tar --exclude 'journal/*' -czf
>>>>>> /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz
>>>>>> btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch 
>>>>>> swtpm
>>>>>> tallylog vdsm_tests.log wtmp yum.log
>>>>>> + cd /var/host_log
>>>>>> + tar -czf
>>>>>> /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz
>>>>>> anaconda audit boot.log boot.log-20181210 boot.log-20181211
>>>>>> boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215
>>>>>> boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202
>>>>>> cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby
>>>>>> grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126
>>>>>> maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats
>>>>>> ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga
>>>>>> secure spooler spooler-20181126 spooler-20181202 spooler-20181210
>>>>>> spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log
>>>>>> yum.log-20170101 yum.log-20180101
>>>>>> tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file
>>>>>> changed as we read it
>>>>>>
>>>>>>
>>>>>> https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/825/nodes/119/steps/209/log/?start=0
>>>>>>
>>>>>> Copying the binary logs from journal is not the way to copy logs.
>>>>>>
>>>>>> If we need the host journal, it should be collected by CI infra using
>>>>>> journalctl.
>>>>>>
>>>>>
>>>> But these are collected by the CI, not us. We just take what was
>>>> already collected.
>>>> Gal, Barak, can you explain this?
>>>>
>>>
>>> As you can see in the code Nir patched, We don't collect it, you do.
>>>
>>
>> We collected what was already collected by CI. Who is collecting those
>> logs into host_log?
>>
>
> its probably just a bind mount of /var/log on the host, so the host itself
> is doing that...
>

But not all logs are there. That was the reason to collect them from
/var/log as well.


>
>>
>>
>>>
>>>
>>>>
>>>>
>>>>> This should fix the issue:
>>>>> https://gerrit.ovirt.org/c/96244/
>>>>>
>>>>>
>>>>>>
>>>>>> Nir
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Any reason why we exclude journal only for /var/log, and do collect
>>>>>>>> the binary journal
>>>>>>>> from /var/host_log? I guess it can fail in the same way.
>>>>>>>>
>>>>>>>
>>>>>>> As far as I know, /vatests/integration/vlan_test.pyr/host_log is
>>>>>>> collected by the CI and placed (copy) there.
>>>>>>> It does not collect everything, therefore we added the second one.
>>>>>>>
>>>>>>>
>>>>>>>> If we collect logs for integration tests, the most important log is
>>>>>>>> the journal, and now
>>>>>>>> we skip it.
>>>>>>>>
>>>>>>>
>>>>>>> I think it appears in /var/host_log.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> We can get the journal in a reliable way like this:
>>>>>>>>
>>>>>>>>     journalctl --since build-date > /tmp/journal.log
>>>>>>>>
>>>>>>>> Nir
>>>>>>>>
>>>>>>>>>
>>>
>>> --
>>> Barak Korren
>>> RHV DevOps team , RHCE, RHCi
>>> Red Hat EMEA
>>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>>>
>>
>
> --
> Barak Korren
> RHV DevOps team , RHCE, RHCi
> Red Hat EMEA
> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
>
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/BDVJ7YYXYNWDVFRSNOCHZ5DXLPT63L3E/

Reply via email to