The failure happened again on "ovirt-srv04". The suite wasn't run from "/dev/shm" since it was full of stale lago environments of "hc-basic-suite-4.1" and "he-basic-iscsi-suite-4.2". The reason for the stale envs is a timeout that was raised by Jenkins (the suites were stuck for 6 hours), so OST's cleanup has not been called. I'm going to add an internal timeout to OST.
On Tue, Mar 20, 2018 at 11:03 AM, Yedidyah Bar David <[email protected]> wrote: > On Tue, Mar 20, 2018 at 10:57 AM, Barak Korren <[email protected]> wrote: > > On 20 March 2018 at 10:53, Yedidyah Bar David <[email protected]> wrote: > >> On Tue, Mar 20, 2018 at 10:11 AM, Barak Korren <[email protected]> > wrote: > >>> On 20 March 2018 at 09:17, Yedidyah Bar David <[email protected]> wrote: > >>>> On Mon, Mar 19, 2018 at 6:56 PM, Dominik Holler <[email protected]> > wrote: > >>>>> Thanks Gal, I expect the problem is fixed until something eats > >>>>> all space in /dev/shm. > >>>>> But the usage of /dev/shm is logged in the output, so we would be > able > >>>>> to detect the problem next time instantly. > >>>>> > >>>>> From my point of view it would be good to know why /dev/shm was full, > >>>>> to prevent this situation in future. > >>>> > >>>> Gal already wrote below - it was because some build failed to clean up > >>>> after itself. > >>>> > >>>> I don't know about this specific case, but I was told that I am > >>>> personally causing such issues by using the 'cancel' button, so I > >>>> sadly stopped. Sadly, because our CI system is quite loaded and when I > >>>> know that some build is useless, I wish to kill it and save some > >>>> load... > >>>> > >>>> Back to your point, perhaps we should make jobs check /dev/shm when > >>>> they _start_, and either alert/fail/whatever if it's not almost free, > >>>> or, if we know what we are doing, just remove stuff there? That might > >>>> be much easier than fixing things to clean up in end, and/or debugging > >>>> why this cleaning failed. > >>> > >>> Sure thing, patches to: > >>> > >>> [jenkins repo]/jobs/confs/shell-scripts/cleanup_slave.sh > >>> > >>> Are welcome, we often find interesting stuff to add there... > >>> > >>> If constrained for time, please turn this comment into an orderly RFE > in Jira... > >> > >> Searched for '/dev/shm' and found way too many places to analyze them > >> all and add something to cleanup_slave to cover all. > > > > Where did you search? > > ovirt-system-tests, lago, lago-ost-plugin. > ovirt-system-tests has 83 occurrences. I realize almost all are in > lago guests, but looking still takes time... > > In theory I can patch cleanup_slave.sh as you suggested, removing > _everything_ there. > Not sure this is safe. > > > > >> > >> Pushed this for now: > >> > >> https://gerrit.ovirt.org/89215 > >> > >>> > >>> -- > >>> Barak Korren > >>> RHV DevOps team , RHCE, RHCi > >>> Red Hat EMEA > >>> redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > >> > >> > >> > >> -- > >> Didi > > > > > > > > -- > > Barak Korren > > RHV DevOps team , RHCE, RHCi > > Red Hat EMEA > > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > > > > -- > Didi > _______________________________________________ > Infra mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/infra > -- *GAL bEN HAIM* RHV DEVOPS
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
