On 12/11/2013 06:27 AM, Vincent Ladeuil wrote: > I spent some time on that issue during my Vanguard shift, see > https://app.asana.com/0/8740321118011/8740321118013 for details, I'll > raise some points and ideas below as this is more work than I thought > and seems worth discussing the various issues and solutions. > > >>>>>> Vincent Ladeuil <[email protected]> writes: > > Hi, > > I've discussed that with jamespage and came up with the following > > workaround: > > > modified debian/jenkins-slave.upstart > > > > === modified file 'debian/jenkins-slave.upstart' > > --- debian/jenkins-slave.upstart 2013-02-17 17:11:13 +0000 > > +++ debian/jenkins-slave.upstart 2013-12-09 10:29:01 +0000 > > @@ -17,3 +17,6 @@ > > exec start-stop-daemon --start -c $JENKINS_USER --exec $JAVA > --name jenkins-slave \ > > -- $JAVA_ARGS -jar $JENKINS_RUN/slave.jar $JENKINS_ARGS > > end script > > + > > +# respawn if the slave crash > > +respawn > > > > I've deployed that on jatayu by adding 'respawn' to > > /etc/init/jenkins-slave.conf so daily-release-executor should now > > restart automatically (I've restarted the jenkins-slave service). > > > .... >>>>>> Francis Ginther <[email protected]> writes: > > Vila, > > My recommendation is to deprecate /usr/local/bin/start-jenkins-slaves > > and rely on individual upstart jobs, one for each slave. > >>>>>> Larry Works <[email protected]> writes: > > I second the motion for upstart jobs for each individual node. > > Looks like we have a consensus on not using > /usr/local/bin/start-jenkins-slaves. > > >>>>>> Larry Works <[email protected]> writes: > > I also would't mind seeing us get away from using SSH to restart > > remote nodes since that will allow us to eliminate another plugin > > (or three). > > Can you elaborate on that ? By 'using SSH to restart remote nodes' you > mean us connecting via ssh and restarting the slaves manually ? > > Probably not as I fail to see the link with plugins... > Some of the jenkins slave nodes (mostly but not strictly limited to VMs) are started from the jenkins master via the use of the ssh-slaves plugin. Installing the jenkins-slave package on ALL nodes and starting them from the node instead of via the ssh-slaves plugin from the master would eliminate the need/use of the ssh-slaves plugin as well as the credentials and ssh-credentials plugins. We can still use the libvirt-slaves plugin to launch the VMs as needed and shut them down when not (as it also reverts the VMs to a saved snapshot state and helps lessen the system load on the VM hosting server). >>>>>> Evan Dandrea <[email protected]> writes: > > On 9 December 2013 13:38, Vincent Ladeuil <[email protected]> wrote: > >> === modified file 'debian/jenkins-slave.upstart' > >> --- debian/jenkins-slave.upstart 2013-02-17 17:11:13 +0000 > >> +++ debian/jenkins-slave.upstart 2013-12-09 10:29:01 +0000 > >> @@ -17,3 +17,6 @@ > >> exec start-stop-daemon --start -c $JENKINS_USER --exec $JAVA --name > jenkins-slave \ > >> -- $JAVA_ARGS -jar $JENKINS_RUN/slave.jar $JENKINS_ARGS > >> end script > >> + > >> +# respawn if the slave crash > >> +respawn > > > respawn limit (http://upstart.ubuntu.com/cookbook/#respawn-limit) > > please. > > Yup, that was (and still is) on my radar, see > https://app.asana.com/0/8740321118011/9113941145531 . > > > > Otherwise we will poorly handle the case where the slave is broken > > (remember the corrupted jar?) and cannot actually be started. > > I vaguely remember but no details, what was the symptom, how can we > automate a check for that ? > > See https://app.asana.com/0/8740321118011/9113941145533 for a proposal > to check the jar validity, feedback welcome. > > Now, I stopped counting at 40 when listing all nodes where we want to do > that (see https://app.asana.com/0/8740321118011/9113941145537). > > 40 is too high for a manual fix and deploy strategy :-/ > > And at that point I wonder if we really want to keep using jlnp or if > it's worth chosing a different way to connect to the slaves. jenkins > proposes two other methods: > > - launch slave agents on Unix machines by using ssh > - launch slave via execution of command on the Master I have not tried the latter of the two methods listed above but the first of the two is counter to my comments about using ssh to start slave nodes. This requires the use of three plugins (which, I believe, we are trying to limit the need for as much as possible). We have also had issues in the recent past with slave nodes started via the ssh-slaves plugin being able to post their artifacts back to the jenkins master. > > My understanding (and practice on http://babune.ladeuil.net:24842) is > that the master can (and will) restart the connection when needed > (including when it's lost), so it may be a better fit[1] than addressing > all the issues we're encountering with jlnp. > > Thoughts ? > > In a nutshell, I feel that we'd be better served in the short term by > restarting the crashed slaves manually with an option of adding > 'respawn' when we do that ; and post-pone the better resolution. > > Vincent > > [1]: That needs to be tested first of course. >
-- Mailing list: https://launchpad.net/~canonical-ci-engineering Post to : [email protected] Unsubscribe : https://launchpad.net/~canonical-ci-engineering More help : https://help.launchpad.net/ListHelp

