Re: Tested failing because of missing loop devices

Eyal Edri Wed, 06 Jan 2016 00:33:55 -0800

I remembered vaguely that restarting the vm helps,  but I don't think we
know the root cause.


Adding  Barak to help with the restart.
On Jan 6, 2016 10:20 AM, "Fabian Deutsch" <[email protected]> wrote:

> Hey,
>
> our Node Next builds are alos failing with some error around loop devices.
>
> This worked just before christmas, but is now constantly failing this year.
>
> Is the root cause already known?
>
> Ryan and Tolik were looking into this from the Node side.
>
> - fabian
>
>
> On Wed, Dec 23, 2015 at 4:52 PM, Nir Soffer <[email protected]> wrote:
> > On Wed, Dec 23, 2015 at 5:11 PM, Eyal Edri <[email protected]> wrote:
> >> I'm guessing this will e solved by running it on lago?
> >> Isn't that what yaniv is working on now?
> >
> > Yes, this may be more stable, but I heard that lago setup takes about
> > an hour, and the whole
> > run about 3 hours, so lot of work is needed until we can use it.
> >
> >> or these are unit tests and not functional?
> >
> > Thats the problem these tests fail because they do not test our code,
> > but the integration of our code in the environment. For example, if the
> test
> > cannot find an available loop device, the test will fail.
> >
> > I think we must move these tests to the integration test package,
> > that does not run on the ci. These tests can be run only on a vm using
> > root privileges, and only single test per vm in the same time, to avoid
> races
> > when accessing shared resources (devices, network, etc.).
> >
> > The best way to run such test is to start a stateless vm based on a
> template
> > that include the entire requirements, so we don't need to pay for yum
> install
> > on each test (may take 2-3 minutes).
> >
> > Some of our customers are using similar setups. Using such setup for our
> > own tests is the best thing we can do to improve the product.
> >
> >>
> >> e.
> >>
> >> On Wed, Dec 23, 2015 at 4:48 PM, Dan Kenigsberg <[email protected]>
> wrote:
> >>>
> >>> On Wed, Dec 23, 2015 at 03:21:31AM +0200, Nir Soffer wrote:
> >>> > Hi all,
> >>> >
> >>> > We see too many failures of tests using loop devices. Is it possible
> >>> > that we run tests
> >>> > concurrently on the same slave, using all the available loop
> devices, or
> >>> > maybe
> >>> > creating races between different tests?
> >>> >
> >>> > It seems that we need new decorator for disabling tests on the CI
> >>> > slaves, since this
> >>> > environment is too fragile.
> >>> >
> >>> > Here are some failures:
> >>> >
> >>> > 01:10:33
> >>> >
> ======================================================================
> >>> > 01:10:33 ERROR: testLoopMount (mountTests.MountTests)
> >>> > 01:10:33
> >>> >
> ----------------------------------------------------------------------
> >>> > 01:10:33 Traceback (most recent call last):
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
> >>> > line 128, in testLoopMount
> >>> > 01:10:33     m.mount(mntOpts="loop")
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 225, in mount
> >>> > 01:10:33     return self._runcmd(cmd, timeout)
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 241, in _runcmd
> >>> > 01:10:33     raise MountError(rc, ";".join((out, err)))
> >>> > 01:10:33 MountError: (32, ';mount: /tmp/tmpZuJRNk: failed to setup
> >>> > loop device: No such file or directory\n')
> >>> > 01:10:33 -------------------- >> begin captured logging <<
> >>> > --------------------
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
> >>> > /sbin/mkfs.ext2 -F /tmp/tmpZuJRNk (cwd None)
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13
> >>> > (17-May-2015)\n'; <rc> = 0
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
> >>> > /usr/bin/mount -o loop /tmp/tmpZuJRNk /var/tmp/tmpJO52Xj (cwd None)
> >>> > 01:10:33 --------------------- >> end captured logging <<
> >>> > ---------------------
> >>> > 01:10:33
> >>> > 01:10:33
> >>> >
> ======================================================================
> >>> > 01:10:33 ERROR: testSymlinkMount (mountTests.MountTests)
> >>> > 01:10:33
> >>> >
> ----------------------------------------------------------------------
> >>> > 01:10:33 Traceback (most recent call last):
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py",
> >>> > line 150, in testSymlinkMount
> >>> > 01:10:33     m.mount(mntOpts="loop")
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 225, in mount
> >>> > 01:10:33     return self._runcmd(cmd, timeout)
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py",
> >>> > line 241, in _runcmd
> >>> > 01:10:33     raise MountError(rc, ";".join((out, err)))
> >>> > 01:10:33 MountError: (32, ';mount: /var/tmp/tmp1UQFPz/backing.img:
> >>> > failed to setup loop device: No such file or directory\n')
> >>> > 01:10:33 -------------------- >> begin captured logging <<
> >>> > --------------------
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
> >>> > /sbin/mkfs.ext2 -F /var/tmp/tmp1UQFPz/backing.img (cwd None)
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13
> >>> > (17-May-2015)\n'; <rc> = 0
> >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1
> >>> > /usr/bin/mount -o loop /var/tmp/tmp1UQFPz/link_to_image
> >>> > /var/tmp/tmp1UQFPz/mountpoint (cwd None)
> >>> > 01:10:33 --------------------- >> end captured logging <<
> >>> > ---------------------
> >>> > 01:10:33
> >>> > 01:10:33
> >>> >
> ======================================================================
> >>> > 01:10:33 ERROR: test_getDevicePartedInfo
> >>> > (parted_utils_tests.PartedUtilsTests)
> >>> > 01:10:33
> >>> >
> ----------------------------------------------------------------------
> >>> > 01:10:33 Traceback (most recent call last):
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/testValidation.py",
> >>> > line 97, in wrapper
> >>> > 01:10:33     return f(*args, **kwargs)
> >>> > 01:10:33   File
> >>> >
> >>> >
> "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/parted_utils_tests.py",
> >>> > line 61, in setUp
> >>> > 01:10:33     self.assertEquals(rc, 0)
> >>> > 01:10:33 AssertionError: 1 != 0
> >>> > 01:10:33 -------------------- >> begin captured logging <<
> >>> > --------------------
> >>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 dd if=/dev/zero
> >>> > of=/tmp/tmpasV8TD bs=100M count=1 (cwd None)
> >>> > 01:10:33 root: DEBUG: SUCCESS: <err> = '1+0 records in\n1+0 records
> >>> > out\n104857600 bytes (105 MB) copied, 0.368498 s, 285 MB/s\n'; <rc> =
> >>> > 0
> >>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 losetup -f
> >>> > --show /tmp/tmpasV8TD (cwd None)
> >>> > 01:10:33 root: DEBUG: FAILED: <err> = 'losetup: /tmp/tmpasV8TD:
> failed
> >>> > to set up loop device: No such file or directory\n'; <rc> = 1
> >>> > 01:10:33 --------------------- >> end captured logging <<
> >>> > ---------------------
> >>> >
> >>>
> >>> I've reluctantly marked another test as broken in
> >>> https://gerrit.ovirt.org/50484
> >>> due to a similar problem.
> >>> Your idea of @brokentest_ci decorator is slightly less bad - at least
> we
> >>> do not ignore errors in this test when run on non-ci platforms.
> >>>
> >>> Regards,
> >>> Dan.
> >>>
> >>> _______________________________________________
> >>> Infra mailing list
> >>> [email protected]
> >>> http://lists.ovirt.org/mailman/listinfo/infra
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Eyal Edri
> >> Associate Manager
> >> EMEA ENG Virtualization R&D
> >> Red Hat Israel
> >>
> >> phone: +972-9-7692018
> >> irc: eedri (on #tlv #rhev-dev #rhev-integ)
> > _______________________________________________
> > Infra mailing list
> > [email protected]
> > http://lists.ovirt.org/mailman/listinfo/infra
>
>
>
> --
> Fabian Deutsch <[email protected]>
> RHEV Hypervisor
> Red Hat
>

_______________________________________________
Infra mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/infra

Re: Tested failing because of missing loop devices

Reply via email to