On 02/10, Eyal Edri wrote:
> it seems that that slave isn't responsive to ssh,
> so it might got some sort of an infra issue.
> 
> i think we should consider adding some sort of verification process for 
> slaves.
> something that will run nightly or before each job if it's fast enough.
> 
> we can think on checking
>  - ping
>  - ssh
>  - git clone..
> 
> david, what do you think? might reduce a lot of false positive failures.

Adding something like that requires tampering jenkins internals (or wrapping a
job inside a job or similar), so it's not easy doing it per-job.

Jenkins itself should be doing connectivity checks periodically and take out
slaves of the pool if unreachable, unresponsive (if the ping is not fast
enough) or the disk/swap is filling up.


The slave log shows that it was connected after the job ran, I'll try to figure
out what happened before with it


> 
> e.
> 
> ----- Original Message -----
> > From: "Yevgeny Zaspitsky" <[email protected]>
> > To: "David Caro" <[email protected]>, "Eyal Edri" <[email protected]>
> > Cc: [email protected]
> > Sent: Tuesday, February 10, 2015 6:15:40 PM
> > Subject: Re: Jenkins failures
> > 
> > Here is an example for a git failure on a Jenkins node:
> > http://jenkins.ovirt.org/job/ovirt-engine_master_find-bugs_gerrit/26316/console
> > 
> > On 05/02/15 16:07, David Caro wrote:
> > > Also take into account that monday/tuesday we had a major outage on 
> > > jenkins
> > > and
> > > all the slaves behaved unreliably if working at all.
> > >
> > > On 02/05, Eyal Edri wrote:
> > >> Hi,
> > >>
> > >> we'd be more than happy to help and fix those issues.
> > >> can you please provide links and info on specific failures so we can 
> > >> debug
> > >> them?
> > >>
> > >> also, you're welcome also to open a ticket to our ticketing system [1] to
> > >> track a specific item.
> > >> keep in mind the infra team is limited in resources, so not all tickets
> > >> might be solves quickly,
> > >> especially if a major outage (like we had this week) is in progress.
> > >>
> > >> [1] https://fedorahosted.org/ovirt/newticket
> > >>
> > >> /e
> > >>
> > >> ----- Original Message -----
> > >>> From: "Yevgeny Zaspitsky" <[email protected]>
> > >>> To: [email protected]
> > >>> Sent: Thursday, February 5, 2015 3:59:34 PM
> > >>> Subject: Jenkins failures
> > >>>
> > >>> Hi All,
> > >>>
> > >>> Lately I barely get any valuable input from the Jenkins CI builds on my
> > >>> patches. Throughout the last week most of the builds finished with
> > >>> different
> > >>> Jenkins failures.
> > >>> The reasons were:
> > >>>
> > >>>
> > >>>      * git failure
> > >>>      * lack of permission to mkdir
> > >>>      * failure to retrieve artifacts from the artifactory
> > >>>      * unexpected shutdown
> > >>>
> > >>> Such a high rate of failures makes the value of the builds very low and
> > >>> causes me to spend my time on understanding whether it's my fault or 
> > >>> not.
> > >>>
> > >>> I'd be very thankful and happier if Jenkins reliability was improved.
> > >>>
> > >>> Regards,
> > >>> Yevgeny
> > >>>
> > >>>
> > >>>      * English - detected
> > >>>      * English
> > >>>      * Hebrew
> > >>>      * Russian
> > >>>
> > >>>
> > >>>      * English
> > >>>      * Hebrew
> > >>>      * Russian
> > >>>
> > >>> _______________________________________________
> > >>> Infra mailing list
> > >>> [email protected]
> > >>> http://lists.ovirt.org/mailman/listinfo/infra
> > >>>
> > >> _______________________________________________
> > >> Infra mailing list
> > >> [email protected]
> > >> http://lists.ovirt.org/mailman/listinfo/infra
> > 
> > 

-- 
David Caro

Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: [email protected]
Web: www.redhat.com
RHT Global #: 82-62605

Attachment: pgp6lnEoP329M.pgp
Description: PGP signature

_______________________________________________
Infra mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/infra

Reply via email to