On Sun, Dec 18, 2016 at 7:17 PM, Nir Soffer <[email protected]> wrote:

> On Sun, Dec 18, 2016 at 6:08 PM, Barak Korren <[email protected]> wrote:
> > On 18 December 2016 at 17:26, Nir Soffer <[email protected]> wrote:
> >> On Sun, Dec 18, 2016 at 4:17 PM, Barak Korren <[email protected]>
> wrote:
> >
> >> We a lot of these errors in the rest of the log. This meas something
> >> is wrong with this vg.
> >>
> >> Needs deeper investigation from storage developer on both engine and
> vdsm side,
> >> but I would start by making sure we use clean luns. We are not trying
> >> to test esoteric
> >> negative flows in the system tests.
> >
> > Here is the storage setup script:
> > https://gerrit.ovirt.org/gitweb?p=ovirt-system-tests.
> git;a=blob;f=common/deploy-scripts/setup_storage_unified_
> he_extra_iscsi_el7.sh;hb=refs/heads/master
>
> 25     iscsiadm -m discovery -t sendtargets -p 127.0.0.1
> 26     iscsiadm -m node -L all
>
> This is alerting. Before we serve these luns, we should log out
> from these nodes, and remove the nodes.
>

This is show a non-up-to-date (or I have to update it) code. In an updated
code, where it also happens, we do the following as well:
    iscsiadm -m node -U all
    iscsiadm -m node -o delete
    systemctl stop iscsi.service
    systemctl disable iscsi.service


> > All storage used in the system tests comes from the engine VM itself,
> > and is placed on a newly allocated QCOW2 file (exposed as /dev/sde to
> > the engine VM), so its unlikely the LUNs are not clean.
>
> We did not change code related to getDeviceList lately, these getPV errors
> tell us that there is an issue in a lower level component or the storage
> server.
>
> Does this test pass with older version of vdsm? engine?
>

We did not test that. It's not very easy to do it in ovirt-system-tests,
though I reckon it is possible with some additional work.
Note that I suspect cold and live merge were not actually tested for ages /
ever in ovirt-system-tests.


>
> >> Did we change something in the system tests project or lago while we
> >> were not looking?
>

Mainly CentOS 7.2 -> CentOS 7.3 change.


> >
> > Not likely as well:
> > https://gerrit.ovirt.org/gitweb?p=ovirt-system-tests.git;a=shortlog
> >
> > ovirt-system-tests project has got its own CI, testing against the
> > last nigthly (we will move it to last build that passed the tests
> > soon). So we are unlikely to merge breaking code there.
>
> It depends on the tests.
>
> Do you have test logging in to the target and creating a vg using
> the luns?
>
> > Then again
> > we're not gating the OS packages so some breakage may have gone in via
> > CentOS repos...
>
> These failures are with centos 7.2 or 7.3? both?
>

Unsure.


>
> >> Can we reproduce this issue manually with same engine and vdsm versions?
> >
> > You have several options:
> > 1: Get engine+vdsm builds from Jenkins:
> >    http://jenkins.ovirt.org/job/ovirt-engine_master_build-
> artifacts-fc24-x86_64/
> >    http://jenkins.ovirt.org/job/vdsm_master_build-artifacts-el7-x86_64/
> >    (Getting the exact builds that went into a given OST run takes tracing
> >     back the job invocation links from that run)
> >
> > 2: Use the latest experimental repo:
> >    http://resources.ovirt.org/repos/ovirt/experimental/
> master/latest/rpm/el7/
> >
> > 3: Run lago and OST locally:
> >    (as documented here:
> >     http://ovirt-system-tests.readthedocs.io/en/latest/
> >     you'd need to pass in the vdsm and engine packages to use)
>

That's what I do, on a daily basis.


>
> Do you know how to setup the system so it run all the setup code up to
> the code that cause the getPV errors?
>

Yes, that should be fairly easy to do.


>
> We need to inspect the system at this point.
>

Let me know and I'll set up a live system quickly tomorrow.
Y.


>
> Nir
>
_______________________________________________
Devel mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/devel

Reply via email to