On Mon, Dec 17, 2018 at 8:39 AM Barak Korren <[email protected]> wrote:
> > > On Mon, 17 Dec 2018 at 08:32, Edward Haas <[email protected]> wrote: > >> >> >> On Sun, Dec 16, 2018 at 3:31 PM Barak Korren <[email protected]> wrote: >> >>> >>> >>> On Sun, 16 Dec 2018 at 14:44, Edward Haas <[email protected]> wrote: >>> >>>> >>>> >>>> On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <[email protected]> wrote: >>>> >>>>> On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <[email protected]> wrote: >>>>> >>>>>> On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> We have this failure that pops randomly: >>>>>>>>>>> >>>>>>>>>>> 1. All tests pass >>>>>>>>>>> >>>>>>>>>>> *00:13:13.284* ___________________________________ summary >>>>>>>>>>> ____________________________________*00:13:13.285* tests: >>>>>>>>>>> commands succeeded*00:13:13.286* storage-py27: commands >>>>>>>>>>> succeeded*00:13:13.286* storage-py36: commands >>>>>>>>>>> succeeded*00:13:13.286* lib-py27: commands >>>>>>>>>>> succeeded*00:13:13.287* lib-py36: commands >>>>>>>>>>> succeeded*00:13:13.288* network-py27: commands >>>>>>>>>>> succeeded*00:13:13.290* network-py36: commands >>>>>>>>>>> succeeded*00:13:13.291* virt-py27: commands >>>>>>>>>>> succeeded*00:13:13.292* virt-py36: commands >>>>>>>>>>> succeeded*00:13:13.293* congratulations :) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2. But we fail to collect logs at the end >>>>>>>>>>> >>>>>>>>>>> *00:14:35.992* >>>>>>>>>>> ##########################################################*00:14:35.995* >>>>>>>>>>> ## Wed Nov 28 17:39:50 UTC 2018 Finished env: >>>>>>>>>>> fc28:fedora-28-x86_64*00:14:35.996* ## took 764 >>>>>>>>>>> seconds*00:14:35.997* ## rc = 1*00:14:35.997* >>>>>>>>>>> ##########################################################*00:14:36.009* >>>>>>>>>>> ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*00:14:36.010* >>>>>>>>>>> ##! Last 20 log entries: >>>>>>>>>>> /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log*00:14:36.011* >>>>>>>>>>> ##!*00:14:36.012* >>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:14:36.013* >>>>>>>>>>> tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file >>>>>>>>>>> changed as we read it*00:14:36.014* >>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:14:36.015* >>>>>>>>>>> lastlog*00:14:36.015* libvirt/*00:14:36.015* >>>>>>>>>>> libvirt/lxc/*00:14:36.015* libvirt/libxl/*00:14:36.016* >>>>>>>>>>> libvirt/qemu/*00:14:36.016* >>>>>>>>>>> libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:14:36.017* >>>>>>>>>>> libvirt/uml/*00:14:36.017* ovirt-guest-agent/*00:14:36.017* >>>>>>>>>>> ovirt-guest-agent/ovirt-guest-agent.log*00:14:36.017* >>>>>>>>>>> README*00:14:36.018* samba/*00:14:36.018* samba/old/*00:14:36.018* >>>>>>>>>>> sssd/*00:14:36.018* tallylog*00:14:36.018* wtmp*00:14:36.018* Took >>>>>>>>>>> 678 seconds*00:14:36.018* >>>>>>>>>>> ===================================*00:14:36.019* ##!*00:14:36.019* >>>>>>>>>>> ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*00:14:36.019* >>>>>>>>>>> ##!######################################################## >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> This looks like an issue with vdsm check-patch.sh: >>>>>>>>>>> >>>>>>>>>>> function collect_logs { >>>>>>>>>>> res=$? >>>>>>>>>>> [ "$res" -ne 0 ] && echo "*** err: $res" >>>>>>>>>>> cd /var/log >>>>>>>>>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * >>>>>>>>>>> cd /var/host_log >>>>>>>>>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> trap collect_logs EXIT >>>>>>>>>>> >>>>>>>>>>> Seems that tar fail to collect log if the log is modified while >>>>>>>>>>> copied, which makes sense. >>>>>>>>>>> >>>>>>>>>>> We can ignore errors in tar, since log collection should not >>>>>>>>>>> fail the build, but I think >>>>>>>>>>> a better solution is to avoid collecting any logs since vdsm >>>>>>>>>>> writes its own logs during >>>>>>>>>>> tests - all the info must be in vdsm log. >>>>>>>>>>> >>>>>>>>>>> Here is the list of collected logs: >>>>>>>>>>> >>>>>>>>>>> *00:13:47.280* + tar -cvzf >>>>>>>>>>> /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz >>>>>>>>>>> btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs >>>>>>>>>>> hawkey.log journal lastlog libvirt openvswitch README tallylog >>>>>>>>>>> vdsm_tests.log wtmp yum.log*00:13:47.285* btmp*00:13:47.285* >>>>>>>>>>> dnf.librepo.log*00:13:47.299* dnf.log*00:13:47.309* >>>>>>>>>>> dnf.rpm.log*00:13:47.310* faillog*00:13:47.311* >>>>>>>>>>> glusterfs/*00:13:47.312* hawkey.log*00:13:47.313* >>>>>>>>>>> journal/*00:13:47.313* lastlog*00:13:47.315* libvirt/*00:13:47.315* >>>>>>>>>>> libvirt/qemu/*00:13:47.316* openvswitch/*00:13:47.317* >>>>>>>>>>> openvswitch/ovs-vswitchd.log*00:13:47.318* >>>>>>>>>>> openvswitch/ovsdb-server.log*00:13:47.319* README*00:13:47.320* >>>>>>>>>>> tallylog*00:13:47.321* vdsm_tests.log*00:13:47.342* >>>>>>>>>>> wtmp*00:13:47.343* yum.log*00:13:47.349* + cd >>>>>>>>>>> /var/host_log*00:13:47.350* + tar -cvzf >>>>>>>>>>> /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz >>>>>>>>>>> anaconda audit boot.log btmp chrony cloud-init.log >>>>>>>>>>> cloud-init-output.log cron dnf.librepo.log dnf.log dnf.rpm.log >>>>>>>>>>> firewalld glusterfs hawkey.log journal lastlog libvirt >>>>>>>>>>> ovirt-guest-agent README samba sssd tallylog wtmp*00:13:47.356* >>>>>>>>>>> anaconda/*00:13:47.356* anaconda/ifcfg.log*00:13:47.357* >>>>>>>>>>> anaconda/ks-script-l5qnynnj.log*00:13:47.358* >>>>>>>>>>> anaconda/storage.log*00:13:47.359* >>>>>>>>>>> anaconda/program.log*00:13:47.395* >>>>>>>>>>> anaconda/ks-script-b5_08tmo.log*00:13:47.396* >>>>>>>>>>> anaconda/ks-script-6uks8bp3.log*00:13:47.397* >>>>>>>>>>> anaconda/hawkey.log*00:13:47.398* anaconda/syslog*00:13:47.406* >>>>>>>>>>> anaconda/journal.log*00:13:47.449* >>>>>>>>>>> anaconda/dnf.librepo.log*00:13:47.458* >>>>>>>>>>> anaconda/packaging.log*00:13:47.465* >>>>>>>>>>> anaconda/dbus.log*00:13:47.466* anaconda/anaconda.log*00:13:47.467* >>>>>>>>>>> anaconda/ks-script-slrcz39_.log*00:13:47.503* audit/*00:13:47.504* >>>>>>>>>>> audit/audit.log.3*00:13:47.657* audit/audit.log.2*00:13:47.814* >>>>>>>>>>> audit/audit.log.1*00:13:47.981* audit/audit.log*00:13:48.008* >>>>>>>>>>> audit/audit.log.4*00:13:48.155* boot.log*00:13:48.156* >>>>>>>>>>> btmp*00:13:48.157* chrony/*00:13:48.159* >>>>>>>>>>> cloud-init.log*00:13:48.159* cloud-init-output.log*00:13:48.161* >>>>>>>>>>> cron*00:13:48.162* dnf.librepo.log*00:13:49.930* >>>>>>>>>>> dnf.log*00:13:51.335* dnf.rpm.log*00:13:51.421* >>>>>>>>>>> firewalld*00:13:51.423* glusterfs/*00:13:51.424* >>>>>>>>>>> hawkey.log*00:13:51.704* journal/*00:13:51.708* >>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/*00:13:51.709* >>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/system.journal*00:13:55.817* >>>>>>>>>>> tar: journal/b087148aba6d49b9bbef488e52a48752/system.journal: file >>>>>>>>>>> changed as we read it*00:13:55.819* >>>>>>>>>>> journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal*00:13:55.915* >>>>>>>>>>> lastlog*00:13:55.923* libvirt/*00:13:55.924* >>>>>>>>>>> libvirt/lxc/*00:13:55.926* libvirt/libxl/*00:13:55.927* >>>>>>>>>>> libvirt/qemu/*00:13:55.928* >>>>>>>>>>> libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log*00:13:55.929* >>>>>>>>>>> libvirt/uml/*00:13:55.930* ovirt-guest-agent/*00:13:55.930* >>>>>>>>>>> ovirt-guest-agent/ovirt-guest-agent.log*00:13:55.932* >>>>>>>>>>> README*00:13:55.933* samba/*00:13:55.933* samba/old/*00:13:55.935* >>>>>>>>>>> sssd/*00:13:55.935* tallylog*00:13:55.935* wtmp >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Most if not all are lot relevant to vdsm tests, and should not >>>>>>>>>>> be collected. >>>>>>>>>>> >>>>>>>>>>> This was added in: >>>>>>>>>>> >>>>>>>>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 >>>>>>>>>>> Author: Edward Haas <[email protected]> >>>>>>>>>>> Date: Tue Apr 17 10:53:11 2018 +0300 >>>>>>>>>>> >>>>>>>>>>> automation: Collect logs even when check-patch fails >>>>>>>>>>> >>>>>>>>>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a >>>>>>>>>>> Signed-off-by: Edward Haas <[email protected]> >>>>>>>>>>> >>>>>>>>>>> Reviewed in: >>>>>>>>>>> https://gerrit.ovirt.org/c/90370 >>>>>>>>>>> >>>>>>>>>>> Edward, can you explain why do we need to collect logs during >>>>>>>>>>> check-patch, >>>>>>>>>>> and why do we need to collect all the logs in the system? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> check-patch are running unit and integrations tests. >>>>>>>>>> The integration tests are touching the OS and other packages >>>>>>>>>> (like openvswitch). >>>>>>>>>> It was added so we can debug why tests failed. >>>>>>>>>> >>>>>>>>>> I guess we can now separate the unit and integration tests, but >>>>>>>>>> it will not solve >>>>>>>>>> the problem presented here. >>>>>>>>>> Failing to collect the logs silently sounds a good enough >>>>>>>>>> solution to me. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Barak suggested to just exclude the journal: >>>>>>>>> https://gerrit.ovirt.org/#/c/95850/ >>>>>>>>> >>>>>>>> >>>>>>>> This is fixed now, thanks! >>>>>>>> >>>>>>> >>>>>> Not fixed yet, we still fail collecting /var/host_log: >>>>>> >>>>>> + cd /var/host_log >>>>>> >>>>>> + tar --exclude 'journal/*' -czf >>>>>> /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz >>>>>> btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch >>>>>> swtpm >>>>>> tallylog vdsm_tests.log wtmp yum.log >>>>>> + cd /var/host_log >>>>>> + tar -czf >>>>>> /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz >>>>>> anaconda audit boot.log boot.log-20181210 boot.log-20181211 >>>>>> boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 >>>>>> boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 >>>>>> cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby >>>>>> grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 >>>>>> maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats >>>>>> ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga >>>>>> secure spooler spooler-20181126 spooler-20181202 spooler-20181210 >>>>>> spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log >>>>>> yum.log-20170101 yum.log-20180101 >>>>>> tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file >>>>>> changed as we read it >>>>>> >>>>>> >>>>>> https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/825/nodes/119/steps/209/log/?start=0 >>>>>> >>>>>> Copying the binary logs from journal is not the way to copy logs. >>>>>> >>>>>> If we need the host journal, it should be collected by CI infra using >>>>>> journalctl. >>>>>> >>>>> >>>> But these are collected by the CI, not us. We just take what was >>>> already collected. >>>> Gal, Barak, can you explain this? >>>> >>> >>> As you can see in the code Nir patched, We don't collect it, you do. >>> >> >> We collected what was already collected by CI. Who is collecting those >> logs into host_log? >> > > its probably just a bind mount of /var/log on the host, so the host itself > is doing that... > But not all logs are there. That was the reason to collect them from /var/log as well. > >> >> >>> >>> >>>> >>>> >>>>> This should fix the issue: >>>>> https://gerrit.ovirt.org/c/96244/ >>>>> >>>>> >>>>>> >>>>>> Nir >>>>>> >>>>>>> >>>>>>> >>>>>>>> Any reason why we exclude journal only for /var/log, and do collect >>>>>>>> the binary journal >>>>>>>> from /var/host_log? I guess it can fail in the same way. >>>>>>>> >>>>>>> >>>>>>> As far as I know, /vatests/integration/vlan_test.pyr/host_log is >>>>>>> collected by the CI and placed (copy) there. >>>>>>> It does not collect everything, therefore we added the second one. >>>>>>> >>>>>>> >>>>>>>> If we collect logs for integration tests, the most important log is >>>>>>>> the journal, and now >>>>>>>> we skip it. >>>>>>>> >>>>>>> >>>>>>> I think it appears in /var/host_log. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> We can get the journal in a reliable way like this: >>>>>>>> >>>>>>>> journalctl --since build-date > /tmp/journal.log >>>>>>>> >>>>>>>> Nir >>>>>>>> >>>>>>>>> >>> >>> -- >>> Barak Korren >>> RHV DevOps team , RHCE, RHCi >>> Red Hat EMEA >>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >>> >> > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted >
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/BDVJ7YYXYNWDVFRSNOCHZ5DXLPT63L3E/
