`interrupt` will work for local workers as well as SSH ones. I had mentioned otherwise above.
On Thu, Apr 30, 2015 at 12:08 PM, Amit Murthy <[email protected]> wrote: > `interrupt(workers())` is the equivalent of sending a SIGINT to the > workers. The tasks which are consuming 100% CPU are interrupted and they > terminate with an InterruptException. > > All processes are still in a running state after this. > > On Thu, Apr 30, 2015 at 10:02 AM, Pavel <[email protected]> wrote: > >> The task-option is interesting. Let's say there are 8 CPU cores. Julia's >> ncpus() returns 9 when started with `julia -p 8`, that is to be expected. >> All 8 cores are 100% loaded during the pmap call. Would >> `interrupt(workers())` leave one running? >> >> On Wednesday, April 29, 2015 at 8:48:15 PM UTC-7, Amit Murthy wrote: >>> >>> Your solution seems reasonable enough. >>> >>> Another solution : You could schedule a task in your julia code which >>> will interrupt the workers after a timeout >>> @schedule begin >>> sleep(600) >>> if pmap_not_complete >>> interrupt(workers()) >>> end >>> end >>> >>> Start this task before executing the pmap >>> >>> Note that this will work only for additional processes created on the >>> local machine. For SSH workers, `interrupt` is a message sent to the remote >>> workers, which will be unable to process it if the main thread is >>> computation bound. >>> >>> >>> >>> On Thu, Apr 30, 2015 at 9:08 AM, Pavel <[email protected]> wrote: >>> >>>> Here is my current bash-script (same timeout-way due to the lack of >>>> alternative suggestions): >>>> >>>> timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1 >>>> killall -9 -v julia >>cleanup.log 2>&1 >>>> >>>> Does that seem reasonable? Perhaps Linux experts may think of some >>>> scenarios where this would not be sufficient as far as the >>>> runaway/non-responding process cleanup? >>>> >>>> >>>> >>>> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote: >>>>> >>>>> What would be a good way to limit the total runtime of a multicore >>>>> process managed by pmap? >>>>> >>>>> I have pmap processing a collection of optimization runs (with >>>>> fminbox) and most of the time everything runs smoothly. On occasion >>>>> however >>>>> 1-2 out of e.g. 8 CPUs take too long to complete one optimization, and >>>>> fminbox/conj. grad. does not have a way to limit run time as recently >>>>> discussed: >>>>> >>>>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html >>>>> >>>>> To deal with this in a crude way, at the moment I call Julia from a >>>>> shell (bash) script with timeout: >>>>> >>>>> timeout 600 julia -p 8 juliacode.jl >>>>> >>>>> When doing this, is there anything to help find and stop >>>>> zombie-processes (if any) after timeout forces a multicore pmap run to >>>>> terminate? Anything within Julia related to how the processes are spawned? >>>>> Any alternatives to shell timeout? I know NLopt has a time limit option >>>>> but >>>>> that is not implemented within Julia (but in the underlying C-library). >>>>> >>>>> >>> >
