Thanks Gal, I expect the problem is fixed until something eats all space in /dev/shm. But the usage of /dev/shm is logged in the output, so we would be able to detect the problem next time instantly.
>From my point of view it would be good to know why /dev/shm was full, to prevent this situation in future. On Mon, 19 Mar 2018 18:44:54 +0200 Gal Ben Haim <gbenh...@redhat.com> wrote: > I see that this failure happens a lot on "ovirt-srv19.phx.ovirt.org > <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org>", and by > different projects that uses ansible. > Not sure it relates, but I've found (and removed) a stale lago > environment in "/dev/shm" that were created by > ovirt-system-tests_he-basic-iscsi-suite -master > <http://jenkins.ovirt.org/view/oVirt%20system%20tests/job/ovirt-system-tests_he-basic-iscsi-suite-master/> > . > The stale environment caused the suite to not run in "/dev/shm". > The maximum number of semaphore on both ovirt-srv19.phx.ovirt.org > <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> and > ovirt-srv23.phx.ovirt.org > <http://jenkins.ovirt.org/computer/ovirt-srv19.phx.ovirt.org> (which > run the ansible suite with success) is 128. > > On Mon, Mar 19, 2018 at 3:37 PM, Yedidyah Bar David <d...@redhat.com> > wrote: > > > Failed also here: > > > > http://jenkins.ovirt.org/job/ovirt-system-tests_master_ > > check-patch-el7-x86_64/4540/ > > > > The patch trigerring this affects many suites, and the job failed > > during ansible-suite-master . > > > > On Mon, Mar 19, 2018 at 3:10 PM, Eyal Edri <ee...@redhat.com> wrote: > > > >> Gal and Daniel are looking into it, strange its not affecting all > >> suites. > >> > >> On Mon, Mar 19, 2018 at 2:11 PM, Dominik Holler > >> <dhol...@redhat.com> wrote: > >> > >>> Looks like /dev/shm is run out of space. > >>> > >>> On Mon, 19 Mar 2018 13:33:28 +0200 > >>> Leon Goldberg <lgold...@redhat.com> wrote: > >>> > >>> > Hey, any updates? > >>> > > >>> > On Sun, Mar 18, 2018 at 10:44 AM, Edward Haas <eh...@redhat.com> > >>> > wrote: > >>> > > >>> > > We are doing nothing special there, just executing ansible > >>> > > through their API. > >>> > > > >>> > > On Sun, Mar 18, 2018 at 10:42 AM, Daniel Belenky > >>> > > <dbele...@redhat.com> wrote: > >>> > > > >>> > >> It's not a space issue. Other suites ran on that slave after > >>> > >> your suite successfully. > >>> > >> I think that the problem is the setting for max semaphores, > >>> > >> though I don't know what you're doing to reach that limit. > >>> > >> > >>> > >> [dbelenky@ovirt-srv18 ~]$ ipcs -ls > >>> > >> > >>> > >> ------ Semaphore Limits -------- > >>> > >> max number of arrays = 128 > >>> > >> max semaphores per array = 250 > >>> > >> max semaphores system wide = 32000 > >>> > >> max ops per semop call = 32 > >>> > >> semaphore max value = 32767 > >>> > >> > >>> > >> > >>> > >> On Sun, Mar 18, 2018 at 10:31 AM, Edward Haas > >>> > >> <eh...@redhat.com> wrote: > >>> > >>> http://jenkins.ovirt.org/job/ovirt-system-tests_network-suit > >>> e-master/ > >>> > >>> > >>> > >>> On Sun, Mar 18, 2018 at 10:24 AM, Daniel Belenky > >>> > >>> <dbele...@redhat.com> wrote: > >>> > >>> > >>> > >>>> Hi Edi, > >>> > >>>> > >>> > >>>> Are there any logs? where you're running the suite? may I > >>> > >>>> have a link? > >>> > >>>> > >>> > >>>> On Sun, Mar 18, 2018 at 8:20 AM, Edward Haas > >>> > >>>> <eh...@redhat.com> wrote: > >>> > >>>>> Good morning, > >>> > >>>>> > >>> > >>>>> We are running in the OST network suite a test module with > >>> > >>>>> Ansible and it started failing during the weekend on > >>> > >>>>> "OSError: [Errno 28] No space left on device" when > >>> > >>>>> attempting to take a lock in the mutiprocessing python > >>> > >>>>> module. > >>> > >>>>> > >>> > >>>>> It smells like a slave resource problem, could someone > >>> > >>>>> help investigate this? > >>> > >>>>> > >>> > >>>>> Thanks, > >>> > >>>>> Edy. > >>> > >>>>> > >>> > >>>>> =================================== FAILURES > >>> > >>>>> =================================== ______________________ > >>> > >>>>> test_ovn_provider_create_scenario _______________________ > >>> > >>>>> > >>> > >>>>> os_client_config = None > >>> > >>>>> > >>> > >>>>> def > >>> > >>>>> test_ovn_provider_create_scenario(os_client_config): > >>> > >>>>> > _test_ovn_provider('create_scenario.yml') > >>> > >>>>> > >>> > >>>>> network-suite-master/tests/test_ovn_provider.py:68: > >>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > >>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ > >>> > >>>>> network-suite-master/tests/test_ovn_provider.py:78: in > >>> > >>>>> _test_ovn_provider playbook.run() > >>> > >>>>> network-suite-master/lib/ansiblelib.py:127: in run > >>> > >>>>> self._run_playbook_executor() > >>> > >>>>> network-suite-master/lib/ansiblelib.py:138: in > >>> > >>>>> _run_playbook_executor pbex = > >>> > >>>>> PlaybookExecutor(**self._pbex_args) > >>> /usr/lib/python2.7/site-packages/ansible/executor/playbook_e > >>> xecutor.py:60: > >>> > >>>>> in __init__ self._tqm = > >>> > >>>>> TaskQueueManager(inventory=inventory, > >>> > >>>>> variable_manager=variable_manager, loader=loader, > >>> > >>>>> options=options, > >>> > >>>>> passwords=self.passwords) /usr/lib/python2.7/site-packag > >>> es/ansible/executor/task_queue_manager.py:104: > >>> > >>>>> in __init__ self._final_q = > >>> > >>>>> multiprocessing.Queue() /usr/lib64/python2.7/multiproc > >>> essing/__init__.py:218: > >>> > >>>>> in Queue return > >>> > >>>>> Queue(maxsize) /usr/lib64/python2.7/multiproc > >>> essing/queues.py:63: > >>> > >>>>> in __init__ self._rlock = > >>> > >>>>> Lock() /usr/lib64/python2.7/multiprocessing/synchronize.py:147: > >>> > >>>>> in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) _ _ _ > >>> > >>>>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > >>> > >>>>> _ _ _ _ _ _ _ _ > >>> > >>>>> > >>> > >>>>> self = <Lock(owner=unknown)>, kind = 1, value = 1, > >>> > >>>>> maxvalue = 1 > >>> > >>>>> > >>> > >>>>> def __init__(self, kind, value, maxvalue): > >>> > >>>>> > sl = self._semlock = > >>> > >>>>> > _multiprocessing.SemLock(kind, value, maxvalue) > >>> > >>>>> E OSError: [Errno 28] No space left on device > >>> > >>>>> > >>> > >>>>> /usr/lib64/python2.7/multiprocessing/synchronize.py:75: > >>> > >>>>> OSError > >>> > >>>>> > >>> > >>>>> > >>> > >>>> > >>> > >>>> > >>> > >>>> -- > >>> > >>>> > >>> > >>>> DANIEL BELENKY > >>> > >>>> > >>> > >>>> RHV DEVOPS > >>> > >>>> > >>> > >>> > >>> > >>> > >>> > >> > >>> > >> > >>> > >> -- > >>> > >> > >>> > >> DANIEL BELENKY > >>> > >> > >>> > >> RHV DEVOPS > >>> > >> > >>> > > > >>> > > > >>> > >>> _______________________________________________ > >>> Infra mailing list > >>> Infra@ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/infra > >>> > >> > >> > >> > >> -- > >> > >> Eyal edri > >> > >> > >> MANAGER > >> > >> RHV DevOps > >> > >> EMEA VIRTUALIZATION R&D > >> > >> > >> Red Hat EMEA <https://www.redhat.com/> > >> <https://red.ht/sig> TRIED. TESTED. TRUSTED. > >> <https://redhat.com/trusted> phone: +972-9-7692018 > >> <+972%209-769-2018> irc: eedri (on #tlv #rhev-dev #rhev-integ) > >> > >> _______________________________________________ > >> Infra mailing list > >> Infra@ovirt.org > >> http://lists.ovirt.org/mailman/listinfo/infra > >> > >> > > > > > > -- > > Didi > > > > > _______________________________________________ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra