On Wed, Dec 19, 2018 at 7:07 PM Nir Soffer <[email protected]> wrote:
> On Sun, Dec 16, 2018 at 2:40 PM Nir Soffer <[email protected]> wrote: > > > > On Sun, Dec 16, 2018 at 2:21 PM Nir Soffer <[email protected]> wrote: > >> > >> On Sun, Dec 2, 2018 at 8:18 AM Edward Haas <[email protected]> wrote: > >>> > >>> > >>> > >>> On Sat, Dec 1, 2018 at 11:10 PM Nir Soffer <[email protected]> wrote: > >>>> > >>>> On Thu, Nov 29, 2018 at 11:21 AM Edward Haas <[email protected]> > wrote: > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Nov 29, 2018 at 10:41 AM Edward Haas <[email protected]> > wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Nov 28, 2018 at 8:12 PM Nir Soffer <[email protected]> > wrote: > >>>>>>> > >>>>>>> We have this failure that pops randomly: > >>>>>>> > >>>>>>> 1. All tests pass > >>>>>>> > >>>>>>> 00:13:13.284 ___________________________________ summary > ____________________________________ > >>>>>>> 00:13:13.285 tests: commands succeeded > >>>>>>> 00:13:13.286 storage-py27: commands succeeded > >>>>>>> 00:13:13.286 storage-py36: commands succeeded > >>>>>>> 00:13:13.286 lib-py27: commands succeeded > >>>>>>> 00:13:13.287 lib-py36: commands succeeded > >>>>>>> 00:13:13.288 network-py27: commands succeeded > >>>>>>> 00:13:13.290 network-py36: commands succeeded > >>>>>>> 00:13:13.291 virt-py27: commands succeeded > >>>>>>> 00:13:13.292 virt-py36: commands succeeded > >>>>>>> 00:13:13.293 congratulations :) > >>>>>>> > >>>>>>> > >>>>>>> 2. But we fail to collect logs at the end > >>>>>>> > >>>>>>> 00:14:35.992 > ########################################################## > >>>>>>> 00:14:35.995 ## Wed Nov 28 17:39:50 UTC 2018 Finished env: > fc28:fedora-28-x86_64 > >>>>>>> 00:14:35.996 ## took 764 seconds > >>>>>>> 00:14:35.997 ## rc = 1 > >>>>>>> 00:14:35.997 > ########################################################## > >>>>>>> 00:14:36.009 ##! ERROR vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv > >>>>>>> 00:14:36.010 ##! Last 20 log entries: > /tmp/mock_logs.Lcop4ZOq/script/stdout_stderr.log > >>>>>>> 00:14:36.011 ##! > >>>>>>> 00:14:36.012 > journal/b087148aba6d49b9bbef488e52a48752/system.journal > >>>>>>> 00:14:36.013 tar: > journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we > read it > >>>>>>> 00:14:36.014 > journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal > >>>>>>> 00:14:36.015 lastlog > >>>>>>> 00:14:36.015 libvirt/ > >>>>>>> 00:14:36.015 libvirt/lxc/ > >>>>>>> 00:14:36.015 libvirt/libxl/ > >>>>>>> 00:14:36.016 libvirt/qemu/ > >>>>>>> 00:14:36.016 > libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log > >>>>>>> 00:14:36.017 libvirt/uml/ > >>>>>>> 00:14:36.017 ovirt-guest-agent/ > >>>>>>> 00:14:36.017 ovirt-guest-agent/ovirt-guest-agent.log > >>>>>>> 00:14:36.017 README > >>>>>>> 00:14:36.018 samba/ > >>>>>>> 00:14:36.018 samba/old/ > >>>>>>> 00:14:36.018 sssd/ > >>>>>>> 00:14:36.018 tallylog > >>>>>>> 00:14:36.018 wtmp > >>>>>>> 00:14:36.018 Took 678 seconds > >>>>>>> 00:14:36.018 =================================== > >>>>>>> 00:14:36.019 ##! > >>>>>>> 00:14:36.019 ##! ERROR ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > >>>>>>> 00:14:36.019 > ##!######################################################## > >>>>>>> > >>>>>>> > >>>>>>> This looks like an issue with vdsm check-patch.sh: > >>>>>>> > >>>>>>> function collect_logs { > >>>>>>> res=$? > >>>>>>> [ "$res" -ne 0 ] && echo "*** err: $res" > >>>>>>> cd /var/log > >>>>>>> tar -cvzf "$EXPORT_DIR/mock_varlogs.tar.gz" * > >>>>>>> cd /var/host_log > >>>>>>> tar -cvzf "$EXPORT_DIR/host_varlogs.tar.gz" * > >>>>>>> } > >>>>>>> > >>>>>>> trap collect_logs EXIT > >>>>>>> > >>>>>>> Seems that tar fail to collect log if the log is modified while > copied, which makes sense. > >>>>>>> > >>>>>>> We can ignore errors in tar, since log collection should not fail > the build, but I think > >>>>>>> a better solution is to avoid collecting any logs since vdsm > writes its own logs during > >>>>>>> tests - all the info must be in vdsm log. > >>>>>>> > >>>>>>> Here is the list of collected logs: > >>>>>>> > >>>>>>> 00:13:47.280 + tar -cvzf > /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/mock_varlogs.tar.gz > btmp dnf.librepo.log dnf.log dnf.rpm.log faillog glusterfs hawkey.log > journal lastlog libvirt openvswitch README tallylog vdsm_tests.log wtmp > yum.log > >>>>>>> 00:13:47.285 btmp > >>>>>>> 00:13:47.285 dnf.librepo.log > >>>>>>> 00:13:47.299 dnf.log > >>>>>>> 00:13:47.309 dnf.rpm.log > >>>>>>> 00:13:47.310 faillog > >>>>>>> 00:13:47.311 glusterfs/ > >>>>>>> 00:13:47.312 hawkey.log > >>>>>>> 00:13:47.313 journal/ > >>>>>>> 00:13:47.313 lastlog > >>>>>>> 00:13:47.315 libvirt/ > >>>>>>> 00:13:47.315 libvirt/qemu/ > >>>>>>> 00:13:47.316 openvswitch/ > >>>>>>> 00:13:47.317 openvswitch/ovs-vswitchd.log > >>>>>>> 00:13:47.318 openvswitch/ovsdb-server.log > >>>>>>> 00:13:47.319 README > >>>>>>> 00:13:47.320 tallylog > >>>>>>> 00:13:47.321 vdsm_tests.log > >>>>>>> 00:13:47.342 wtmp > >>>>>>> 00:13:47.343 yum.log > >>>>>>> 00:13:47.349 + cd /var/host_log > >>>>>>> 00:13:47.350 + tar -cvzf > /home/jenkins/workspace/vdsm_master_check-patch-fc28-x86_64/vdsm/exported-artifacts/host_varlogs.tar.gz > anaconda audit boot.log btmp chrony cloud-init.log cloud-init-output.log > cron dnf.librepo.log dnf.log dnf.rpm.log firewalld glusterfs hawkey.log > journal lastlog libvirt ovirt-guest-agent README samba sssd tallylog wtmp > >>>>>>> 00:13:47.356 anaconda/ > >>>>>>> 00:13:47.356 anaconda/ifcfg.log > >>>>>>> 00:13:47.357 anaconda/ks-script-l5qnynnj.log > >>>>>>> 00:13:47.358 anaconda/storage.log > >>>>>>> 00:13:47.359 anaconda/program.log > >>>>>>> 00:13:47.395 anaconda/ks-script-b5_08tmo.log > >>>>>>> 00:13:47.396 anaconda/ks-script-6uks8bp3.log > >>>>>>> 00:13:47.397 anaconda/hawkey.log > >>>>>>> 00:13:47.398 anaconda/syslog > >>>>>>> 00:13:47.406 anaconda/journal.log > >>>>>>> 00:13:47.449 anaconda/dnf.librepo.log > >>>>>>> 00:13:47.458 anaconda/packaging.log > >>>>>>> 00:13:47.465 anaconda/dbus.log > >>>>>>> 00:13:47.466 anaconda/anaconda.log > >>>>>>> 00:13:47.467 anaconda/ks-script-slrcz39_.log > >>>>>>> 00:13:47.503 audit/ > >>>>>>> 00:13:47.504 audit/audit.log.3 > >>>>>>> 00:13:47.657 audit/audit.log.2 > >>>>>>> 00:13:47.814 audit/audit.log.1 > >>>>>>> 00:13:47.981 audit/audit.log > >>>>>>> 00:13:48.008 audit/audit.log.4 > >>>>>>> 00:13:48.155 boot.log > >>>>>>> 00:13:48.156 btmp > >>>>>>> 00:13:48.157 chrony/ > >>>>>>> 00:13:48.159 cloud-init.log > >>>>>>> 00:13:48.159 cloud-init-output.log > >>>>>>> 00:13:48.161 cron > >>>>>>> 00:13:48.162 dnf.librepo.log > >>>>>>> 00:13:49.930 dnf.log > >>>>>>> 00:13:51.335 dnf.rpm.log > >>>>>>> 00:13:51.421 firewalld > >>>>>>> 00:13:51.423 glusterfs/ > >>>>>>> 00:13:51.424 hawkey.log > >>>>>>> 00:13:51.704 journal/ > >>>>>>> 00:13:51.708 journal/b087148aba6d49b9bbef488e52a48752/ > >>>>>>> 00:13:51.709 > journal/b087148aba6d49b9bbef488e52a48752/system.journal > >>>>>>> 00:13:55.817 tar: > journal/b087148aba6d49b9bbef488e52a48752/system.journal: file changed as we > read it > >>>>>>> 00:13:55.819 > journal/b087148aba6d49b9bbef488e52a48752/user-1000.journal > >>>>>>> 00:13:55.915 lastlog > >>>>>>> 00:13:55.923 libvirt/ > >>>>>>> 00:13:55.924 libvirt/lxc/ > >>>>>>> 00:13:55.926 libvirt/libxl/ > >>>>>>> 00:13:55.927 libvirt/qemu/ > >>>>>>> 00:13:55.928 > libvirt/qemu/LiveOS-f920001d-be4e-47ea-ac26-72480fd5be87.log > >>>>>>> 00:13:55.929 libvirt/uml/ > >>>>>>> 00:13:55.930 ovirt-guest-agent/ > >>>>>>> 00:13:55.930 ovirt-guest-agent/ovirt-guest-agent.log > >>>>>>> 00:13:55.932 README > >>>>>>> 00:13:55.933 samba/ > >>>>>>> 00:13:55.933 samba/old/ > >>>>>>> 00:13:55.935 sssd/ > >>>>>>> 00:13:55.935 tallylog > >>>>>>> 00:13:55.935 wtmp > >>>>>>> > >>>>>>> > >>>>>>> Most if not all are lot relevant to vdsm tests, and should not be > collected. > >>>>>>> > >>>>>>> This was added in: > >>>>>>> > >>>>>>> commit 9c9c17297433e5a5a49aa19cde10b206e7db61e9 > >>>>>>> Author: Edward Haas <[email protected]> > >>>>>>> Date: Tue Apr 17 10:53:11 2018 +0300 > >>>>>>> > >>>>>>> automation: Collect logs even when check-patch fails > >>>>>>> > >>>>>>> Change-Id: Idfe07ce6fc55473b1db1d7f16754f559cc5c345a > >>>>>>> Signed-off-by: Edward Haas <[email protected]> > >>>>>>> > >>>>>>> Reviewed in: > >>>>>>> https://gerrit.ovirt.org/c/90370 > >>>>>>> > >>>>>>> Edward, can you explain why do we need to collect logs during > check-patch, > >>>>>>> and why do we need to collect all the logs in the system? > >>>>>> > >>>>>> > >>>>>> check-patch are running unit and integrations tests. > >>>>>> The integration tests are touching the OS and other packages (like > openvswitch). > >>>>>> It was added so we can debug why tests failed. > >>>>>> > >>>>>> I guess we can now separate the unit and integration tests, but it > will not solve > >>>>>> the problem presented here. > >>>>>> Failing to collect the logs silently sounds a good enough solution > to me. > >>>>> > >>>>> > >>>>> Barak suggested to just exclude the journal: > https://gerrit.ovirt.org/#/c/95850/ > >>>> > >>>> > >>>> This is fixed now, thanks! > >> > >> > >> Not fixed yet, we still fail collecting /var/host_log: > >> > >> + cd /var/host_log > >> > >> + tar --exclude 'journal/*' -czf > /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/mock_varlogs.tar.gz > btmp faillog glusterfs grubby_prune_debug lastlog libvirt openvswitch swtpm > tallylog vdsm_tests.log wtmp yum.log > >> + cd /var/host_log > >> + tar -czf > /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz > anaconda audit boot.log boot.log-20181210 boot.log-20181211 > boot.log-20181212 boot.log-20181213 boot.log-20181214 boot.log-20181215 > boot.log-20181216 btmp btmp-20181201 cron cron-20181126 cron-20181202 > cron-20181210 cron-20181216 dmesg dmesg.old firewalld glusterfs grubby > grubby_prune_debug httpd journal lastlog libvirt maillog maillog-20181126 > maillog-20181202 maillog-20181210 maillog-20181216 messages ntpstats > ovirt-engine ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga > secure spooler spooler-20181126 spooler-20181202 spooler-20181210 > spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log > yum.log-20170101 yum.log-20180101 > >> tar: journal/d2b3276bfc6c7a4e95ce6b2b9b5d0f20/system.journal: file > changed as we read it > >> > >> > https://jenkins.ovirt.org/blue/rest/organizations/jenkins/pipelines/vdsm_standard-check-patch/runs/825/nodes/119/steps/209/log/?start=0 > >> > >> Copying the binary logs from journal is not the way to copy logs. > >> > >> If we need the host journal, it should be collected by CI infra using > journalctl. > > > > > > This should fix the issue: > > https://gerrit.ovirt.org/c/96244/ > > This was not anough, it fails again: > > + tar --exclude 'journal/*' -czf > > /home/jenkins/workspace/vdsm_standard-check-patch/vdsm/exported-artifacts/host_varlogs.tar.gz > anaconda audit boot.log boot.log-20181213 boot.log-20181214 > boot.log-20181215 boot.log-20181216 boot.log-20181217 > boot.log-20181218 boot.log-20181219 btmp btmp-20181201 cron > cron-20181126 cron-20181202 cron-20181210 cron-20181216 dmesg > dmesg.old firewalld glusterfs grubby grubby_prune_debug httpd journal > lastlog libvirt maillog maillog-20181126 maillog-20181202 > maillog-20181210 maillog-20181216 messages ntpstats ovirt-engine > ovirt-guest-agent ovirt-imageio-proxy ppp puppet qemu-ga secure > spooler spooler-20181126 spooler-20181202 spooler-20181210 > spooler-20181216 tallylog tuned wpa_supplicant.log wtmp yum.log > yum.log-20170101 yum.log-20180101 > > tar: secure: file changed as we read it > > We can exclude also "secure", but this is not the right way. > > Can we make a list of logs we are interested in, instead of collecting > everything? > > Then we can setup something more reliable to do the collection. > This should finally fix the issue: https://gerrit.ovirt.org/c/96412/ > > Nir > > > > >> > >> > >> Nir > >> > >>>> > >>>> > >>>> Any reason why we exclude journal only for /var/log, and do collect > the binary journal > >>>> from /var/host_log? I guess it can fail in the same way. > >>> > >>> > >>> As far as I know, /var/host_log is collected by the CI and placed > (copy) there. > >>> It does not collect everything, therefore we added the second one. > >>> > >>>> > >>>> If we collect logs for integration tests, the most important log is > the journal, and now > >>>> we skip it. > >>> > >>> > >>> I think it appears in /var/host_log. > >>> > >>>> > >>>> > >>>> We can get the journal in a reliable way like this: > >>>> > >>>> journalctl --since build-date > /tmp/journal.log > >>>> > >>>> Nir >
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/EHYBFXZHV5NW3GDDL4RY3S7NGDFXF7IQ/
