On Mon, Mar 30, 2020 at 5:38 PM Galit Rosenthal <[email protected]> wrote:
> It looks like the local repo stops running. > When I run curl before the failure just to check the status, I can see it > isn't accessible. > > I'm trying to see where it fails or what cause it to fail. > > I manage to reproduce on BM > I thought that moving setup_storage will mitigate the issue: https://gerrit.ovirt.org/#/c/107989/ But it just postponed the error to further phase, now adding host failing to the same issue: Failed to download metadata for repo 'alocalsync' https://jenkins.ovirt.org/view/oVirt system tests/job/ovirt-system-tests_manual/6710 So Galit, please take a look, oVirt CQ is suffering from this issue for more than a week now > > On Mon, Mar 30, 2020 at 6:23 PM Marcin Sobczyk <[email protected]> > wrote: > >> Hi Galit >> >> I can see the issue again - now in manual OST runs: >> >> >> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/6711/consoleFull#L2,856 >> >> Regards, Marcin >> >> On 3/23/20 10:09 PM, Marcin Sobczyk wrote: >> >> >> >> On 3/23/20 8:51 PM, Galit Rosenthal wrote: >> >> I run it now locally using the extra sources as it runs in the CQ and it >> didn't fail for me. >> >> I will continue to investigate tomorrow, >> >> Marcin, did you see this issue also in check_patch or only in CQ? >> >> I wasn't aware of the issue till Nir raised it - I was working with the >> patch previously >> and both check-patch and manual runs were fine. I think it concerns only >> CQ then. >> >> Regards, >> Galit >> >> On Mon, Mar 23, 2020 at 4:29 PM Galit Rosenthal <[email protected]> >> wrote: >> >>> I will look at it. >>> >>> On Mon, Mar 23, 2020 at 4:18 PM Martin Perina <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On Mon, Mar 23, 2020 at 3:16 PM Marcin Sobczyk <[email protected]> >>>> wrote: >>>> >>>>> >>>>> >>>>> On 3/23/20 3:10 PM, Marcin Sobczyk wrote: >>>>> > >>>>> > >>>>> > On 3/23/20 2:53 PM, Nir Soffer wrote: >>>>> >> On Mon, Mar 23, 2020 at 3:26 PM Marcin Sobczyk <[email protected]> >>>>> >>>>> >> wrote: >>>>> >>> >>>>> >>> >>>>> >>> On 3/23/20 2:17 PM, Nir Soffer wrote: >>>>> >>>> On Mon, Mar 23, 2020 at 1:25 PM Marcin Sobczyk >>>>> >>>> <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>> On 3/21/20 1:18 AM, Nir Soffer wrote: >>>>> >>>>> >>>>> >>>>> On Fri, Mar 20, 2020 at 9:35 PM Nir Soffer <[email protected]> >>>>> >>>>> wrote: >>>>> >>>>>> Looks like infrastructure issue setting up storage on engine >>>>> host. >>>>> >>>>>> >>>>> >>>>>> Here are 2 failing builds with unrelated changes: >>>>> >>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6677/ >>>>> >>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6678/ >>>>> >>>>> Rebuilding still fails in setup_storage: >>>>> >>>>> >>>>> >>>>> >>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6679/testReport/ >>>>> >>>>> >>>>> >>>>> >>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6680/testReport/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Is this a known issue? >>>>> >>>>>> >>>>> >>>>>> Error Message >>>>> >>>>>> >>>>> >>>>>> AssertionError: setup_storage.sh failed. Exit code is 1 assert >>>>> 1 >>>>> >>>>>> == 0 -1 +0 >>>>> >>>>>> >>>>> >>>>>> Stacktrace >>>>> >>>>>> >>>>> >>>>>> prefix = <ovirtlago.prefix.OvirtPrefix object at 0x7f6fd2b998d0> >>>>> >>>>>> >>>>> >>>>>> @pytest.mark.run(order=14) >>>>> >>>>>> def test_configure_storage(prefix): >>>>> >>>>>> engine = prefix.virt_env.engine_vm() >>>>> >>>>>> result = engine.ssh( >>>>> >>>>>> [ >>>>> >>>>>> '/tmp/setup_storage.sh', >>>>> >>>>>> ], >>>>> >>>>>> ) >>>>> >>>>>>> assert result.code == 0, 'setup_storage.sh failed. >>>>> Exit >>>>> >>>>>>> code is %s' % result.code >>>>> >>>>>> E AssertionError: setup_storage.sh failed. Exit code is 1 >>>>> >>>>>> E assert 1 == 0 >>>>> >>>>>> E -1 >>>>> >>>>>> E +0 >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> The pytest traceback is nice, but in this case it is does not >>>>> >>>>>> show any useful info. >>>>> >>>>>> >>>>> >>>>>> Since we run a script using ssh, the error message should >>>>> include >>>>> >>>>>> the process stdout and stderr >>>>> >>>>>> which probably can explain the failure. >>>>> >>>>> I posted https://gerrit.ovirt.org/#/c/107830/ to improve >>>>> logging >>>>> >>>>> during storage setup. >>>>> >>>>> Unfortunately AFAICS it didn't fail, so I guess we'll have to >>>>> >>>>> merge it and wait for a failed job to get some helpful logs. >>>>> >>>> Thanks. >>>>> >>>> >>>>> >>>> It still fails for me with current code: >>>>> >>>> >>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6689/testReport/ >>>>> >>>> >>>>> >>>> >>>>> >>>> Same when using current vdsm master. >>>>> >>> Updated the patch according to your suggestions and currently >>>>> trying >>>>> >>> out >>>>> >>> OST for the 4th time - >>>>> >>> all previous runs succeeded. I guess I'm out of luck :) >>>>> >> It succeeds on your local OST setup but fail on Jenkins? >>>>> > No, I mean jenkins - both check-patch runs didn't fail on this >>>>> script. >>>>> > I also tried running OST manually twice and same thing happened. >>>>> > Anyway - the patch has been merged now so if any failure occurs in CQ >>>>> > we should know what's going on. >>>>> Ok, finally caught a failure in CQ [1]: >>>>> >>>>> [2020-03-23T14:14:09.836Z] if result.code != 0: >>>>> [2020-03-23T14:14:09.836Z] msg = ( >>>>> [2020-03-23T14:14:09.836Z] 'setup_storage.sh failed >>>>> with >>>>> exit code: {}.\n' >>>>> [2020-03-23T14:14:09.836Z] 'stdout:\n{}' >>>>> [2020-03-23T14:14:09.836Z] 'stderr:\n{}' >>>>> [2020-03-23T14:14:09.836Z] ).format(result.code, >>>>> result.out, >>>>> result.err) >>>>> [2020-03-23T14:14:09.836Z] > raise RuntimeError(msg) >>>>> [2020-03-23T14:14:09.836Z] E RuntimeError: setup_storage.sh >>>>> failed with exit code: 1. >>>>> [2020-03-23T14:14:09.836Z] E stdout: >>>>> [2020-03-23T14:14:09.836Z] E Reposync & Extra Sources >>>>> Content 0.0 B/s | 0 B 00:00 >>>>> [2020-03-23T14:14:09.836Z] E stderr: >>>>> [2020-03-23T14:14:09.836Z] E + set -xe >>>>> [2020-03-23T14:14:09.836Z] E + >>>>> MAIN_NFS_DEV=disk/by-id/scsi-0QEMU_QEMU_HARDDISK_2 >>>>> [2020-03-23T14:14:09.836Z] E + >>>>> ISCSI_DEV=disk/by-id/scsi-0QEMU_QEMU_HARDDISK_3 >>>>> [2020-03-23T14:14:09.836Z] E + NUM_LUNS=5 >>>>> [2020-03-23T14:14:09.836Z] E ++ uname -r >>>>> [2020-03-23T14:14:09.836Z] E ++ awk -F. '{print $(NF-1)}' >>>>> [2020-03-23T14:14:09.836Z] E + DIST=el8_1 >>>>> [2020-03-23T14:14:09.836Z] E + main >>>>> [2020-03-23T14:14:09.836Z] E ++ hostname >>>>> [2020-03-23T14:14:09.836Z] E + [[ >>>>> lago-basic-suite-master-engine == *\i\p\v\6* ]] >>>>> [2020-03-23T14:14:09.836Z] E + install_deps >>>>> [2020-03-23T14:14:09.836Z] E + systemctl disable --now >>>>> kdump.service >>>>> [2020-03-23T14:14:09.836Z] E Removed >>>>> /etc/systemd/system/multi-user.target.wants/kdump.service. >>>>> [2020-03-23T14:14:09.836Z] E + yum install --nogpgcheck -y >>>>> nfs-utils rpcbind lvm2 targetcli sg3_utils iscsi-initiator-utils >>>>> lsscsi >>>>> policycoreutils-python-utils >>>>> [2020-03-23T14:14:09.836Z] E Failed to download metadata for >>>>> repo 'alocalsync' >>>>> [2020-03-23T14:14:09.836Z] E Error: Failed to download >>>>> metadata for repo 'alocalsync' >>>>> >>>>> >>>>> [1] >>>>> >>>>> https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-master_change-queue-tester/detail/ovirt-master_change-queue-tester/21420/pipeline >>>> >>>> >>>> Galit, could you please take a look? >>>> >>>>> >>>>> >>>>> > >>>>> >> >>>>> >>>>>> Also I wonder why this code is called as a test >>>>> >>>>>> (test_configure_storage). This looks like setup >>>>> >>>>>> step so it should run as a fixture. >>>>> >>>>> That's true, but the pytest porting effort was about providing a >>>>> >>>>> bare minimum to move away from nose. >>>>> >>>>> Organizing the tests into proper setup/fixtures is a huge task >>>>> and >>>>> >>>>> will be probably implemented >>>>> >>>>> incrementally in the nearest future. >>>>> >>>> Understood >>>>> >>>> >>>>> > >>>>> >>>>> >>>> >>>> -- >>>> Martin Perina >>>> Manager, Software Engineering >>>> Red Hat Czech s.r.o. >>>> >>> >>> >>> -- >>> >>> GALIT ROSENTHAL >>> >>> SOFTWARE ENGINEER >>> >>> Red Hat >>> >>> <https://www.redhat.com/> >>> >>> [email protected] T: 972-9-7692230 >>> <https://red.ht/sig> >>> >> >> >> -- >> >> GALIT ROSENTHAL >> >> SOFTWARE ENGINEER >> >> Red Hat >> >> <https://www.redhat.com/> >> >> [email protected] T: 972-9-7692230 >> <https://red.ht/sig> >> >> >> >> > > -- > > GALIT ROSENTHAL > > SOFTWARE ENGINEER > > Red Hat > > <https://www.redhat.com/> > > [email protected] T: 972-9-7692230 > <https://red.ht/sig> > -- Martin Perina Manager, Software Engineering Red Hat Czech s.r.o.
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/MCZ6TCP5NJL5RSWDFON76AH7WRGOY7GH/
