The task-option is interesting. Let's say there are 8 CPU cores. Julia's 
ncpus() returns 9 when started with `julia -p 8`, that is to be expected. 
All 8 cores are 100% loaded during the pmap call. Would 
`interrupt(workers())` leave one running?

On Wednesday, April 29, 2015 at 8:48:15 PM UTC-7, Amit Murthy wrote:
>
> Your solution seems reasonable enough.
>
> Another solution : You could schedule a task in your julia code which will 
> interrupt the workers after a timeout
> @schedule begin
>   sleep(600)
>   if pmap_not_complete
>      interrupt(workers())
>   end
> end
>
> Start this task before executing the pmap
>
> Note that this will work only for additional processes created on the 
> local machine. For SSH workers, `interrupt` is a message sent to the remote 
> workers, which will be unable to process it if the main thread is 
> computation bound.  
>
>
>
> On Thu, Apr 30, 2015 at 9:08 AM, Pavel <[email protected] <javascript:>
> > wrote:
>
>> Here is my current bash-script (same timeout-way due to the lack of 
>> alternative suggestions):
>>
>>     timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1
>>     killall -9 -v julia >>cleanup.log 2>&1
>>
>> Does that seem reasonable? Perhaps Linux experts may think of some 
>> scenarios where this would not be sufficient as far as the 
>> runaway/non-responding process cleanup?
>>
>>
>>
>> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote:
>>>
>>> What would be a good way to limit the total runtime of a multicore 
>>> process managed by pmap?
>>>
>>> I have pmap processing a collection of optimization runs (with fminbox) 
>>> and most of the time everything runs smoothly. On occasion however 1-2 out 
>>> of e.g. 8 CPUs take too long to complete one optimization, and 
>>> fminbox/conj. grad. does not have a way to limit run time as recently 
>>> discussed:
>>>
>>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html
>>>
>>> To deal with this in a crude way, at the moment I call Julia from a 
>>> shell (bash) script with timeout:
>>>
>>>     timeout 600 julia -p 8 juliacode.jl
>>>
>>> When doing this, is there anything to help find and stop 
>>> zombie-processes (if any) after timeout forces a multicore pmap run to 
>>> terminate? Anything within Julia related to how the processes are spawned? 
>>> Any alternatives to shell timeout? I know NLopt has a time limit option but 
>>> that is not implemented within Julia (but in the underlying C-library).
>>>
>>>
>

Reply via email to