On Friday, October 28, 2016 at 9:36:49 PM UTC-7, John Calsbeek wrote:
>
>
> Shared storage is a potential option, yes, but the tasks in question are 
> currently not very fault-tolerant when it comes to network hitches.
>

Well, it would pay to make them more fault-tolerant :-) But even if you do 
not fix the process, you do not have to run it from the shared storage, 
just use it as storage. Using a node-local mirror, you can rsync it from 
shared storage, run the task, then rsync it back - assuming your data does 
not change much (which I understand is not always the case) - you will soon 
have a relatively recent rsync copy on every node - reducing amount of data 
moving. May or may not work in your case, but something to consider.
 

>  
>  
>
>> But more to the point, if your main issue is that you are worried that a 
>> node may be unavailable, you may consider some automatic node allocation. I 
>> am not sure if there are other examples, but for example the AWS node 
>> allocation can automatically allocate a new node if no threads are 
>> available for a label. That may be a decent backup strategy. If you are not 
>> using AWS - you can probably look if there is another node provisioning 
>> plugin that fits or if not, look at how they do that and write your own 
>> plugin to do it
>>
>
> Assuming that we have a fixed amount of computing resources, does this 
> have any advantage over writing a LoadBalancer plugin?
>

If you are allocating your nodes instead of pre-creating, you do not have 
to have a big shared pool, instead specific nodes are allocated with same 
label only as needed, and as old nodes that died are decommissioned, they 
can re-join the pool of available resources. Of course if you feed the 
affinity requirement, just using them all as a pool is probably easier.

 
>
>> But maybe I am overthinking it. In the end, if your primary concern is 
>> that node may be down - remember that pipeline is groovy code - groovy code 
>> that has access to the Jenkins API/internals. You can write some code that 
>> will check the state of the slaves and select a label to use before you 
>> even get to the node() statement. Sure, that will not fix the issue of a 
>> node going down in a middle of a job, but may catch the job before it 
>> assigns a task to a dead node.
>>
>
> Ah, that's an interesting idea. Something that I forgot to mention in the 
> original post is that if there was a node() function that allocates with a 
> timeout, that would also be a building block that we could use to fix this 
> problem. (If attempting to allocate a specific node fails with a timeout, 
> then schedule on a fallback. timeout() doesn't work because that would 
> apply the timeout to the task as well, not merely to the attempt to 
> allocate the node.) We could indeed query the status of nodes directly. I 
> have a niggling doubt that it would be possible to do this without a race 
> condition (what if the node goes down between querying its status and 
> scheduling on it?), but it's definitely something worth investigating.
>

I am wondering if you can do some weird combination of  parallel + sleep + 
failFast  + try/catch to emulate a timeout for a specific task

>  
>
>> Alternatively, you can simply write another job, in lieu of a plugin, 
>> that will scan all your tasks and nodes and if it detects a node down and a 
>> task waiting for it, assign the label to another node from the "standby" 
>> pool
>>
>
> This is an idea that we had considered, yeah, although I was considering 
> it as a first step in the pipeline before scheduling, which made me nervous 
> about race conditions. But if, as you suggest, it was a frequently run job 
> which is always attempting to set up node allocations… that could 
> definitely work. Good suggestion, thanks!
>  
>
Throw enough things against a wall, something will stick ;-)  Glad to be of 
help.

Good luck.

 -M

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/997916ea-597c-4853-9cf0-8946c81e5c1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to