On 02/10, Eyal Edri wrote: > it seems that that slave isn't responsive to ssh, > so it might got some sort of an infra issue. > > i think we should consider adding some sort of verification process for > slaves. > something that will run nightly or before each job if it's fast enough. > > we can think on checking > - ping > - ssh > - git clone.. > > david, what do you think? might reduce a lot of false positive failures.
Adding something like that requires tampering jenkins internals (or wrapping a job inside a job or similar), so it's not easy doing it per-job. Jenkins itself should be doing connectivity checks periodically and take out slaves of the pool if unreachable, unresponsive (if the ping is not fast enough) or the disk/swap is filling up. The slave log shows that it was connected after the job ran, I'll try to figure out what happened before with it > > e. > > ----- Original Message ----- > > From: "Yevgeny Zaspitsky" <[email protected]> > > To: "David Caro" <[email protected]>, "Eyal Edri" <[email protected]> > > Cc: [email protected] > > Sent: Tuesday, February 10, 2015 6:15:40 PM > > Subject: Re: Jenkins failures > > > > Here is an example for a git failure on a Jenkins node: > > http://jenkins.ovirt.org/job/ovirt-engine_master_find-bugs_gerrit/26316/console > > > > On 05/02/15 16:07, David Caro wrote: > > > Also take into account that monday/tuesday we had a major outage on > > > jenkins > > > and > > > all the slaves behaved unreliably if working at all. > > > > > > On 02/05, Eyal Edri wrote: > > >> Hi, > > >> > > >> we'd be more than happy to help and fix those issues. > > >> can you please provide links and info on specific failures so we can > > >> debug > > >> them? > > >> > > >> also, you're welcome also to open a ticket to our ticketing system [1] to > > >> track a specific item. > > >> keep in mind the infra team is limited in resources, so not all tickets > > >> might be solves quickly, > > >> especially if a major outage (like we had this week) is in progress. > > >> > > >> [1] https://fedorahosted.org/ovirt/newticket > > >> > > >> /e > > >> > > >> ----- Original Message ----- > > >>> From: "Yevgeny Zaspitsky" <[email protected]> > > >>> To: [email protected] > > >>> Sent: Thursday, February 5, 2015 3:59:34 PM > > >>> Subject: Jenkins failures > > >>> > > >>> Hi All, > > >>> > > >>> Lately I barely get any valuable input from the Jenkins CI builds on my > > >>> patches. Throughout the last week most of the builds finished with > > >>> different > > >>> Jenkins failures. > > >>> The reasons were: > > >>> > > >>> > > >>> * git failure > > >>> * lack of permission to mkdir > > >>> * failure to retrieve artifacts from the artifactory > > >>> * unexpected shutdown > > >>> > > >>> Such a high rate of failures makes the value of the builds very low and > > >>> causes me to spend my time on understanding whether it's my fault or > > >>> not. > > >>> > > >>> I'd be very thankful and happier if Jenkins reliability was improved. > > >>> > > >>> Regards, > > >>> Yevgeny > > >>> > > >>> > > >>> * English - detected > > >>> * English > > >>> * Hebrew > > >>> * Russian > > >>> > > >>> > > >>> * English > > >>> * Hebrew > > >>> * Russian > > >>> > > >>> _______________________________________________ > > >>> Infra mailing list > > >>> [email protected] > > >>> http://lists.ovirt.org/mailman/listinfo/infra > > >>> > > >> _______________________________________________ > > >> Infra mailing list > > >> [email protected] > > >> http://lists.ovirt.org/mailman/listinfo/infra > > > > -- David Caro Red Hat S.L. Continuous Integration Engineer - EMEA ENG Virtualization R&D Tel.: +420 532 294 605 Email: [email protected] Web: www.redhat.com RHT Global #: 82-62605
pgp6lnEoP329M.pgp
Description: PGP signature
_______________________________________________ Infra mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/infra
