On Mon, Mar 30, 2020 at 5:38 PM Galit Rosenthal <[email protected]> wrote:

> It looks like the local repo stops running.
> When I run curl before the failure just to check the status, I can see it
> isn't accessible.
>
> I'm trying to see where it fails or what cause it to fail.
>
> I manage to reproduce on BM
>

I thought that moving setup_storage will mitigate the issue:
https://gerrit.ovirt.org/#/c/107989/
But it just postponed the error to further phase, now adding host failing
to the same issue: Failed to download metadata for repo 'alocalsync'

https://jenkins.ovirt.org/view/oVirt system
tests/job/ovirt-system-tests_manual/6710

So Galit, please take a look, oVirt CQ is suffering from this issue for
more than a week now

>
> On Mon, Mar 30, 2020 at 6:23 PM Marcin Sobczyk <[email protected]>
> wrote:
>
>> Hi Galit
>>
>> I can see the issue again - now in manual OST runs:
>>
>>
>> https://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_manual/6711/consoleFull#L2,856
>>
>> Regards, Marcin
>>
>> On 3/23/20 10:09 PM, Marcin Sobczyk wrote:
>>
>>
>>
>> On 3/23/20 8:51 PM, Galit Rosenthal wrote:
>>
>> I run it now locally using the extra sources as it runs in the CQ and it
>> didn't fail for me.
>>
>> I will continue to investigate tomorrow,
>>
>> Marcin, did you see this issue also in check_patch or only in CQ?
>>
>> I wasn't aware of the issue till Nir raised it - I was working with the
>> patch previously
>> and both check-patch and manual runs were fine. I think it concerns only
>> CQ then.
>>
>> Regards,
>> Galit
>>
>> On Mon, Mar 23, 2020 at 4:29 PM Galit Rosenthal <[email protected]>
>> wrote:
>>
>>> I will look at it.
>>>
>>> On Mon, Mar 23, 2020 at 4:18 PM Martin Perina <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Mar 23, 2020 at 3:16 PM Marcin Sobczyk <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 3/23/20 3:10 PM, Marcin Sobczyk wrote:
>>>>> >
>>>>> >
>>>>> > On 3/23/20 2:53 PM, Nir Soffer wrote:
>>>>> >> On Mon, Mar 23, 2020 at 3:26 PM Marcin Sobczyk <[email protected]>
>>>>>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>>
>>>>> >>> On 3/23/20 2:17 PM, Nir Soffer wrote:
>>>>> >>>> On Mon, Mar 23, 2020 at 1:25 PM Marcin Sobczyk
>>>>> >>>> <[email protected]> wrote:
>>>>> >>>>>
>>>>> >>>>> On 3/21/20 1:18 AM, Nir Soffer wrote:
>>>>> >>>>>
>>>>> >>>>> On Fri, Mar 20, 2020 at 9:35 PM Nir Soffer <[email protected]>
>>>>> >>>>> wrote:
>>>>> >>>>>> Looks like infrastructure issue setting up storage on engine
>>>>> host.
>>>>> >>>>>>
>>>>> >>>>>> Here are 2 failing builds with unrelated changes:
>>>>> >>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6677/
>>>>> >>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6678/
>>>>> >>>>> Rebuilding still fails in setup_storage:
>>>>> >>>>>
>>>>> >>>>>
>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6679/testReport/
>>>>> >>>>>
>>>>> >>>>>
>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6680/testReport/
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>>> Is this a known issue?
>>>>> >>>>>>
>>>>> >>>>>> Error Message
>>>>> >>>>>>
>>>>> >>>>>> AssertionError: setup_storage.sh failed. Exit code is 1 assert
>>>>> 1
>>>>> >>>>>> == 0   -1   +0
>>>>> >>>>>>
>>>>> >>>>>> Stacktrace
>>>>> >>>>>>
>>>>> >>>>>> prefix = <ovirtlago.prefix.OvirtPrefix object at 0x7f6fd2b998d0>
>>>>> >>>>>>
>>>>> >>>>>>       @pytest.mark.run(order=14)
>>>>> >>>>>>       def test_configure_storage(prefix):
>>>>> >>>>>>           engine = prefix.virt_env.engine_vm()
>>>>> >>>>>>           result = engine.ssh(
>>>>> >>>>>>               [
>>>>> >>>>>>                   '/tmp/setup_storage.sh',
>>>>> >>>>>>               ],
>>>>> >>>>>>           )
>>>>> >>>>>>>         assert result.code == 0, 'setup_storage.sh failed.
>>>>> Exit
>>>>> >>>>>>> code is %s' % result.code
>>>>> >>>>>> E       AssertionError: setup_storage.sh failed. Exit code is 1
>>>>> >>>>>> E       assert 1 == 0
>>>>> >>>>>> E         -1
>>>>> >>>>>> E         +0
>>>>> >>>>>>
>>>>> >>>>>>
>>>>> >>>>>> The pytest traceback is nice, but in this case it is does not
>>>>> >>>>>> show any useful info.
>>>>> >>>>>>
>>>>> >>>>>> Since we run a script using ssh, the error message should
>>>>> include
>>>>> >>>>>> the process stdout and stderr
>>>>> >>>>>> which probably can explain the failure.
>>>>> >>>>> I posted https://gerrit.ovirt.org/#/c/107830/ to improve
>>>>> logging
>>>>> >>>>> during storage setup.
>>>>> >>>>> Unfortunately AFAICS it didn't fail, so I guess we'll have to
>>>>> >>>>> merge it and wait for a failed job to get some helpful logs.
>>>>> >>>> Thanks.
>>>>> >>>>
>>>>> >>>> It still fails for me with current code:
>>>>> >>>>
>>>>> https://jenkins.ovirt.org/job/ovirt-system-tests_manual/6689/testReport/
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> Same when using current vdsm master.
>>>>> >>> Updated the patch according to your suggestions and currently
>>>>> trying
>>>>> >>> out
>>>>> >>> OST for the 4th time -
>>>>> >>> all previous runs succeeded. I guess I'm out of luck :)
>>>>> >> It succeeds on your local OST setup but fail on Jenkins?
>>>>> > No, I mean jenkins - both check-patch runs didn't fail on this
>>>>> script.
>>>>> > I also tried running OST manually twice and same thing happened.
>>>>> > Anyway - the patch has been merged now so if any failure occurs in CQ
>>>>> > we should know what's going on.
>>>>> Ok, finally caught a failure in CQ [1]:
>>>>>
>>>>> [2020-03-23T14:14:09.836Z]         if result.code != 0:
>>>>> [2020-03-23T14:14:09.836Z]             msg = (
>>>>> [2020-03-23T14:14:09.836Z]                 'setup_storage.sh failed
>>>>> with
>>>>> exit code: {}.\n'
>>>>> [2020-03-23T14:14:09.836Z]                 'stdout:\n{}'
>>>>> [2020-03-23T14:14:09.836Z]                 'stderr:\n{}'
>>>>> [2020-03-23T14:14:09.836Z]             ).format(result.code,
>>>>> result.out,
>>>>> result.err)
>>>>> [2020-03-23T14:14:09.836Z] >           raise RuntimeError(msg)
>>>>> [2020-03-23T14:14:09.836Z] E           RuntimeError: setup_storage.sh
>>>>> failed with exit code: 1.
>>>>> [2020-03-23T14:14:09.836Z] E           stdout:
>>>>> [2020-03-23T14:14:09.836Z] E           Reposync & Extra Sources
>>>>> Content                0.0  B/s |   0  B     00:00
>>>>> [2020-03-23T14:14:09.836Z] E           stderr:
>>>>> [2020-03-23T14:14:09.836Z] E           + set -xe
>>>>> [2020-03-23T14:14:09.836Z] E           +
>>>>> MAIN_NFS_DEV=disk/by-id/scsi-0QEMU_QEMU_HARDDISK_2
>>>>> [2020-03-23T14:14:09.836Z] E           +
>>>>> ISCSI_DEV=disk/by-id/scsi-0QEMU_QEMU_HARDDISK_3
>>>>> [2020-03-23T14:14:09.836Z] E           + NUM_LUNS=5
>>>>> [2020-03-23T14:14:09.836Z] E           ++ uname -r
>>>>> [2020-03-23T14:14:09.836Z] E           ++ awk -F. '{print $(NF-1)}'
>>>>> [2020-03-23T14:14:09.836Z] E           + DIST=el8_1
>>>>> [2020-03-23T14:14:09.836Z] E           + main
>>>>> [2020-03-23T14:14:09.836Z] E           ++ hostname
>>>>> [2020-03-23T14:14:09.836Z] E           + [[
>>>>> lago-basic-suite-master-engine == *\i\p\v\6* ]]
>>>>> [2020-03-23T14:14:09.836Z] E           + install_deps
>>>>> [2020-03-23T14:14:09.836Z] E           + systemctl disable --now
>>>>> kdump.service
>>>>> [2020-03-23T14:14:09.836Z] E           Removed
>>>>> /etc/systemd/system/multi-user.target.wants/kdump.service.
>>>>> [2020-03-23T14:14:09.836Z] E           + yum install --nogpgcheck -y
>>>>> nfs-utils rpcbind lvm2 targetcli sg3_utils iscsi-initiator-utils
>>>>> lsscsi
>>>>> policycoreutils-python-utils
>>>>> [2020-03-23T14:14:09.836Z] E           Failed to download metadata for
>>>>> repo 'alocalsync'
>>>>> [2020-03-23T14:14:09.836Z] E           Error: Failed to download
>>>>> metadata for repo 'alocalsync'
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> https://jenkins.ovirt.org/blue/organizations/jenkins/ovirt-master_change-queue-tester/detail/ovirt-master_change-queue-tester/21420/pipeline
>>>>
>>>>
>>>> Galit, could you please take a look?
>>>>
>>>>>
>>>>>
>>>>> >
>>>>> >>
>>>>> >>>>>> Also I wonder why this code is called as a test
>>>>> >>>>>> (test_configure_storage). This looks like setup
>>>>> >>>>>> step so it should run as a fixture.
>>>>> >>>>> That's true, but the pytest porting effort was about providing a
>>>>> >>>>> bare minimum to move away from nose.
>>>>> >>>>> Organizing the tests into proper setup/fixtures is a huge task
>>>>> and
>>>>> >>>>> will be probably implemented
>>>>> >>>>> incrementally in the nearest future.
>>>>> >>>> Understood
>>>>> >>>>
>>>>> >
>>>>>
>>>>>
>>>>
>>>> --
>>>> Martin Perina
>>>> Manager, Software Engineering
>>>> Red Hat Czech s.r.o.
>>>>
>>>
>>>
>>> --
>>>
>>> GALIT ROSENTHAL
>>>
>>> SOFTWARE ENGINEER
>>>
>>> Red Hat
>>>
>>> <https://www.redhat.com/>
>>>
>>> [email protected]    T: 972-9-7692230
>>> <https://red.ht/sig>
>>>
>>
>>
>> --
>>
>> GALIT ROSENTHAL
>>
>> SOFTWARE ENGINEER
>>
>> Red Hat
>>
>> <https://www.redhat.com/>
>>
>> [email protected]    T: 972-9-7692230
>> <https://red.ht/sig>
>>
>>
>>
>>
>
> --
>
> GALIT ROSENTHAL
>
> SOFTWARE ENGINEER
>
> Red Hat
>
> <https://www.redhat.com/>
>
> [email protected]    T: 972-9-7692230
> <https://red.ht/sig>
>


-- 
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/MCZ6TCP5NJL5RSWDFON76AH7WRGOY7GH/

Reply via email to