I remembered vaguely that restarting the vm helps, but I don't think we know the root cause.
Adding Barak to help with the restart. On Jan 6, 2016 10:20 AM, "Fabian Deutsch" <[email protected]> wrote: > Hey, > > our Node Next builds are alos failing with some error around loop devices. > > This worked just before christmas, but is now constantly failing this year. > > Is the root cause already known? > > Ryan and Tolik were looking into this from the Node side. > > - fabian > > > On Wed, Dec 23, 2015 at 4:52 PM, Nir Soffer <[email protected]> wrote: > > On Wed, Dec 23, 2015 at 5:11 PM, Eyal Edri <[email protected]> wrote: > >> I'm guessing this will e solved by running it on lago? > >> Isn't that what yaniv is working on now? > > > > Yes, this may be more stable, but I heard that lago setup takes about > > an hour, and the whole > > run about 3 hours, so lot of work is needed until we can use it. > > > >> or these are unit tests and not functional? > > > > Thats the problem these tests fail because they do not test our code, > > but the integration of our code in the environment. For example, if the > test > > cannot find an available loop device, the test will fail. > > > > I think we must move these tests to the integration test package, > > that does not run on the ci. These tests can be run only on a vm using > > root privileges, and only single test per vm in the same time, to avoid > races > > when accessing shared resources (devices, network, etc.). > > > > The best way to run such test is to start a stateless vm based on a > template > > that include the entire requirements, so we don't need to pay for yum > install > > on each test (may take 2-3 minutes). > > > > Some of our customers are using similar setups. Using such setup for our > > own tests is the best thing we can do to improve the product. > > > >> > >> e. > >> > >> On Wed, Dec 23, 2015 at 4:48 PM, Dan Kenigsberg <[email protected]> > wrote: > >>> > >>> On Wed, Dec 23, 2015 at 03:21:31AM +0200, Nir Soffer wrote: > >>> > Hi all, > >>> > > >>> > We see too many failures of tests using loop devices. Is it possible > >>> > that we run tests > >>> > concurrently on the same slave, using all the available loop > devices, or > >>> > maybe > >>> > creating races between different tests? > >>> > > >>> > It seems that we need new decorator for disabling tests on the CI > >>> > slaves, since this > >>> > environment is too fragile. > >>> > > >>> > Here are some failures: > >>> > > >>> > 01:10:33 > >>> > > ====================================================================== > >>> > 01:10:33 ERROR: testLoopMount (mountTests.MountTests) > >>> > 01:10:33 > >>> > > ---------------------------------------------------------------------- > >>> > 01:10:33 Traceback (most recent call last): > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py", > >>> > line 128, in testLoopMount > >>> > 01:10:33 m.mount(mntOpts="loop") > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py", > >>> > line 225, in mount > >>> > 01:10:33 return self._runcmd(cmd, timeout) > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py", > >>> > line 241, in _runcmd > >>> > 01:10:33 raise MountError(rc, ";".join((out, err))) > >>> > 01:10:33 MountError: (32, ';mount: /tmp/tmpZuJRNk: failed to setup > >>> > loop device: No such file or directory\n') > >>> > 01:10:33 -------------------- >> begin captured logging << > >>> > -------------------- > >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1 > >>> > /sbin/mkfs.ext2 -F /tmp/tmpZuJRNk (cwd None) > >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13 > >>> > (17-May-2015)\n'; <rc> = 0 > >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1 > >>> > /usr/bin/mount -o loop /tmp/tmpZuJRNk /var/tmp/tmpJO52Xj (cwd None) > >>> > 01:10:33 --------------------- >> end captured logging << > >>> > --------------------- > >>> > 01:10:33 > >>> > 01:10:33 > >>> > > ====================================================================== > >>> > 01:10:33 ERROR: testSymlinkMount (mountTests.MountTests) > >>> > 01:10:33 > >>> > > ---------------------------------------------------------------------- > >>> > 01:10:33 Traceback (most recent call last): > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/mountTests.py", > >>> > line 150, in testSymlinkMount > >>> > 01:10:33 m.mount(mntOpts="loop") > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py", > >>> > line 225, in mount > >>> > 01:10:33 return self._runcmd(cmd, timeout) > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/vdsm/storage/mount.py", > >>> > line 241, in _runcmd > >>> > 01:10:33 raise MountError(rc, ";".join((out, err))) > >>> > 01:10:33 MountError: (32, ';mount: /var/tmp/tmp1UQFPz/backing.img: > >>> > failed to setup loop device: No such file or directory\n') > >>> > 01:10:33 -------------------- >> begin captured logging << > >>> > -------------------- > >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1 > >>> > /sbin/mkfs.ext2 -F /var/tmp/tmp1UQFPz/backing.img (cwd None) > >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: SUCCESS: <err> = 'mke2fs 1.42.13 > >>> > (17-May-2015)\n'; <rc> = 0 > >>> > 01:10:33 Storage.Misc.excCmd: DEBUG: /usr/bin/taskset --cpu-list 0-1 > >>> > /usr/bin/mount -o loop /var/tmp/tmp1UQFPz/link_to_image > >>> > /var/tmp/tmp1UQFPz/mountpoint (cwd None) > >>> > 01:10:33 --------------------- >> end captured logging << > >>> > --------------------- > >>> > 01:10:33 > >>> > 01:10:33 > >>> > > ====================================================================== > >>> > 01:10:33 ERROR: test_getDevicePartedInfo > >>> > (parted_utils_tests.PartedUtilsTests) > >>> > 01:10:33 > >>> > > ---------------------------------------------------------------------- > >>> > 01:10:33 Traceback (most recent call last): > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/testValidation.py", > >>> > line 97, in wrapper > >>> > 01:10:33 return f(*args, **kwargs) > >>> > 01:10:33 File > >>> > > >>> > > "/home/jenkins/workspace/vdsm_master_check-patch-fc23-x86_64/vdsm/tests/parted_utils_tests.py", > >>> > line 61, in setUp > >>> > 01:10:33 self.assertEquals(rc, 0) > >>> > 01:10:33 AssertionError: 1 != 0 > >>> > 01:10:33 -------------------- >> begin captured logging << > >>> > -------------------- > >>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 dd if=/dev/zero > >>> > of=/tmp/tmpasV8TD bs=100M count=1 (cwd None) > >>> > 01:10:33 root: DEBUG: SUCCESS: <err> = '1+0 records in\n1+0 records > >>> > out\n104857600 bytes (105 MB) copied, 0.368498 s, 285 MB/s\n'; <rc> = > >>> > 0 > >>> > 01:10:33 root: DEBUG: /usr/bin/taskset --cpu-list 0-1 losetup -f > >>> > --show /tmp/tmpasV8TD (cwd None) > >>> > 01:10:33 root: DEBUG: FAILED: <err> = 'losetup: /tmp/tmpasV8TD: > failed > >>> > to set up loop device: No such file or directory\n'; <rc> = 1 > >>> > 01:10:33 --------------------- >> end captured logging << > >>> > --------------------- > >>> > > >>> > >>> I've reluctantly marked another test as broken in > >>> https://gerrit.ovirt.org/50484 > >>> due to a similar problem. > >>> Your idea of @brokentest_ci decorator is slightly less bad - at least > we > >>> do not ignore errors in this test when run on non-ci platforms. > >>> > >>> Regards, > >>> Dan. > >>> > >>> _______________________________________________ > >>> Infra mailing list > >>> [email protected] > >>> http://lists.ovirt.org/mailman/listinfo/infra > >>> > >>> > >> > >> > >> > >> -- > >> Eyal Edri > >> Associate Manager > >> EMEA ENG Virtualization R&D > >> Red Hat Israel > >> > >> phone: +972-9-7692018 > >> irc: eedri (on #tlv #rhev-dev #rhev-integ) > > _______________________________________________ > > Infra mailing list > > [email protected] > > http://lists.ovirt.org/mailman/listinfo/infra > > > > -- > Fabian Deutsch <[email protected]> > RHEV Hypervisor > Red Hat >
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
