Re: [julia-users] Re: Limiting time of multicore run and related cleanup

Amit Murthy Wed, 29 Apr 2015 23:48:43 -0700

 `interrupt` will work for local workers as well as SSH ones. I had
mentioned otherwise above.


On Thu, Apr 30, 2015 at 12:08 PM, Amit Murthy <[email protected]> wrote:

> `interrupt(workers())` is the equivalent of sending a SIGINT to the
> workers. The tasks which are consuming 100% CPU are interrupted and they
> terminate with an InterruptException.
>
> All processes are still in a running state after this.
>
> On Thu, Apr 30, 2015 at 10:02 AM, Pavel <[email protected]> wrote:
>
>> The task-option is interesting. Let's say there are 8 CPU cores. Julia's
>> ncpus() returns 9 when started with `julia -p 8`, that is to be expected.
>> All 8 cores are 100% loaded during the pmap call. Would
>> `interrupt(workers())` leave one running?
>>
>> On Wednesday, April 29, 2015 at 8:48:15 PM UTC-7, Amit Murthy wrote:
>>>
>>> Your solution seems reasonable enough.
>>>
>>> Another solution : You could schedule a task in your julia code which
>>> will interrupt the workers after a timeout
>>> @schedule begin
>>>   sleep(600)
>>>   if pmap_not_complete
>>>      interrupt(workers())
>>>   end
>>> end
>>>
>>> Start this task before executing the pmap
>>>
>>> Note that this will work only for additional processes created on the
>>> local machine. For SSH workers, `interrupt` is a message sent to the remote
>>> workers, which will be unable to process it if the main thread is
>>> computation bound.
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 9:08 AM, Pavel <[email protected]> wrote:
>>>
>>>> Here is my current bash-script (same timeout-way due to the lack of
>>>> alternative suggestions):
>>>>
>>>>     timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1
>>>>     killall -9 -v julia >>cleanup.log 2>&1
>>>>
>>>> Does that seem reasonable? Perhaps Linux experts may think of some
>>>> scenarios where this would not be sufficient as far as the
>>>> runaway/non-responding process cleanup?
>>>>
>>>>
>>>>
>>>> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote:
>>>>>
>>>>> What would be a good way to limit the total runtime of a multicore
>>>>> process managed by pmap?
>>>>>
>>>>> I have pmap processing a collection of optimization runs (with
>>>>> fminbox) and most of the time everything runs smoothly. On occasion 
>>>>> however
>>>>> 1-2 out of e.g. 8 CPUs take too long to complete one optimization, and
>>>>> fminbox/conj. grad. does not have a way to limit run time as recently
>>>>> discussed:
>>>>>
>>>>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html
>>>>>
>>>>> To deal with this in a crude way, at the moment I call Julia from a
>>>>> shell (bash) script with timeout:
>>>>>
>>>>>     timeout 600 julia -p 8 juliacode.jl
>>>>>
>>>>> When doing this, is there anything to help find and stop
>>>>> zombie-processes (if any) after timeout forces a multicore pmap run to
>>>>> terminate? Anything within Julia related to how the processes are spawned?
>>>>> Any alternatives to shell timeout? I know NLopt has a time limit option 
>>>>> but
>>>>> that is not implemented within Julia (but in the underlying C-library).
>>>>>
>>>>>
>>>
>

Re: [julia-users] Re: Limiting time of multicore run and related cleanup

Reply via email to