Hello James

>Regarding the problem of Jenkins going offline, my feeling would be to 
>mitigate this risk by: (1) Warning and doing a qdel on all Jenkins-submitted 
>jobs when Jenkins shuts down cleanly. (2) Warning and doing a qdel on all 
>lingering Jenkins jobs when Jenkins comes back up after an unclean shutdown.


That's the beauty of durable tasks and similar approaches: we won't have to 
qdel PBS jobs. Jenkins will reconnect automagically with the cluster and will 
continue monitoring the job till it is over.

>I don't think I understand the durable-task solution, as to me that sounds 
>like one would not be able to move existing freestyle jobs back and forth 
>between conventional nodes and batch system nodes... It seems to me that the 
>right solution would be a new kind of slave node, rather than a new kind of 
>job, though I'm happy to be corrected...


I'm still learning about the durable-task solution (and keeping track of what I 
understand about it [1]). But IIUC we could still use the slave node to 
represent the PBS cluster. The problem lies in the way we trigger PBS jobs and 
keep track of what's running. 

At the moment you can use a Freestyle project in Jenkins to qsub a job to the 
cluster. But if anything happens to Jenkins while the script is running in the 
cluster, you may lose that build and maybe have to connect to the cluster or 
look for it in the PBS queues in Jenkins.

Using the durable-task solution, or similar approaches, after Jenkins is 
restarted it would revive the job for us. Yesterday a researcher submitted a 
pull request [2] to make the Freestyle project to run as long as the PBS job is 
running in the cluster. 

I will try to write a prototype using some Jenkins+PBS durable solution and 
what he proposed too, and cut a new release by next weekend (though with the 
world cup that could take a few more days :-)

Bruno

[1] http://tupilabs.com/2014/06/13/durable-tasks-in-jenkins.html

[2] https://github.com/biouno/pbs-plugin/pull/2


>________________________________
> From: James Hetherington <[email protected]>
>To: [email protected] 
>Sent: Thursday, June 12, 2014 9:38 AM
>Subject: Re: Jenkins plugin for HPC job systems
> 
>
>
>I had a look at Bruno's plugin -- it looks like a great starting point for 
>what we need.
>
>I think generalising Bruno's PBS Java API to support other similar systems 
>with qsub/qstat type commands should work well: it seems that a 
>PBSSlaveComputer could be generalised into a BatchSystemSlave.
>
>
>Regarding the problem of Jenkins going offline, my feeling would be to 
>mitigate this risk by: (1) Warning and doing a qdel on all Jenkins-submitted 
>jobs when Jenkins shuts down cleanly. (2) Warning and doing a qdel on all 
>lingering Jenkins jobs when Jenkins comes back up after an unclean shutdown.
>
>
>I don't think I understand the durable-task solution, as to me that sounds 
>like one would not be able to move existing freestyle jobs back and forth 
>between conventional nodes and batch system nodes... It seems to me that the 
>right solution would be a new kind of slave node, rather than a new kind of 
>job, though I'm happy to be corrected...
>
>
>
>
>
>On Thu, Jun 12, 2014 at 1:29 AM, 'Bruno P. Kinoshita' via Jenkins Developers 
><[email protected]> wrote:
>
>Hi Jesse,
>>
>>
>>I'm waiting for the workflow plugin but hadn't heard about the durable tasks 
>>plugin. 
>>
>>
>>The current implementation of the pbs plugin [1] was a POC to submit jobs via 
>>the qsub command to a PBS Torque server. In our tests a PBS job was triggered 
>>from a Freestyle project to the server via a SSH jump box in a university 
>>cluster. It is working and creating new jobs.
>>
>>
>>However, indeed if Jenkins goes offline the build stops running, even though 
>>the PBS job might still be running. I thought about re-using the 
>>monitor-external-job plug-in, but in some places qsub might be the only 
>>option to submit jobs.
>> 
>>I will experiment with the durable task plugin. Any advice on how to add 
>>items created in the cluster to the build queue in Jenkins? 
>>
>>
>>The current implementation creates a PBSSlaveComputer that represents a PBS 
>>Server. A Widget is created to retrieve the list of queues and its jobs from 
>>the server.
>>
>>
>>Thanks!
>>Bruno
>>
>>
>>[1] https://github.com/biouno/pbs-plugin
>>
>>
>>
>>>________________________________
>>> From: Jesse Glick <[email protected]>
>>>To: [email protected] 
>>>Sent: Wednesday, June 11, 2014 1:11 PM
>>>Subject: Re: Jenkins plugin for HPC job systems
>>> 
>>>
>>>On Wed, Jun 11, 2014 at 10:05 AM, James Hetherington <[email protected]> 
>>>wrote:
>>>> Is anyone aware of a plugin which already does this?
>>>
>>>No but I have heard of someone interested in SGE support. My idea was
>>>to implement the durable-task-plugin API, at which point any client of
>>>that plugin can use the system (and Jenkins does not need to be
>>>continuously running while the scheduling system runs your batch job).
>>>Jenkins Enterprise by CloudBees has one such client, which looks and
>>>feels like a freestyle project; the upcoming Workflow plugin suite has
>>>another caller, which is used routinely for running
 forked commands
>>>like shell scripts.
>>>
>>>
>>>-- 
>>>You received this message because you are subscribed to the Google Groups 
>>>"Jenkins Developers" group.
>>>To unsubscribe from this group and stop receiving emails from it, send an 
>>>email to [email protected].
>>>
>>>For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>-- 
>>You received this message because you are subscribed to a topic in the Google 
>>Groups "Jenkins Developers" group.
>>To unsubscribe from this topic, visit 
>>https://groups.google.com/d/topic/jenkinsci-dev/nLOBkO8ttVM/unsubscribe.
>>To unsubscribe from this group and all its topics, send an email to 
>>[email protected].
>>For more options, visit https://groups.google.com/d/optout.
>>
>
-- 
>You received this message because you are subscribed to the Google Groups 
>"Jenkins Developers" group.
>To unsubscribe from this group and stop receiving emails from it, send an 
>email to [email protected].
>For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to