The task-option is interesting. Let's say there are 8 CPU cores. Julia's ncpus() returns 9 when started with `julia -p 8`, that is to be expected. All 8 cores are 100% loaded during the pmap call. Would `interrupt(workers())` leave one running?
On Wednesday, April 29, 2015 at 8:48:15 PM UTC-7, Amit Murthy wrote: > > Your solution seems reasonable enough. > > Another solution : You could schedule a task in your julia code which will > interrupt the workers after a timeout > @schedule begin > sleep(600) > if pmap_not_complete > interrupt(workers()) > end > end > > Start this task before executing the pmap > > Note that this will work only for additional processes created on the > local machine. For SSH workers, `interrupt` is a message sent to the remote > workers, which will be unable to process it if the main thread is > computation bound. > > > > On Thu, Apr 30, 2015 at 9:08 AM, Pavel <[email protected] <javascript:> > > wrote: > >> Here is my current bash-script (same timeout-way due to the lack of >> alternative suggestions): >> >> timeout 600 julia -p $(nproc) juliacode.jl >>results.log 2>&1 >> killall -9 -v julia >>cleanup.log 2>&1 >> >> Does that seem reasonable? Perhaps Linux experts may think of some >> scenarios where this would not be sufficient as far as the >> runaway/non-responding process cleanup? >> >> >> >> On Thursday, April 2, 2015 at 12:15:33 PM UTC-7, Pavel wrote: >>> >>> What would be a good way to limit the total runtime of a multicore >>> process managed by pmap? >>> >>> I have pmap processing a collection of optimization runs (with fminbox) >>> and most of the time everything runs smoothly. On occasion however 1-2 out >>> of e.g. 8 CPUs take too long to complete one optimization, and >>> fminbox/conj. grad. does not have a way to limit run time as recently >>> discussed: >>> >>> http://julia-programming-language.2336112.n4.nabble.com/fminbox-getting-quot-stuck-quot-td12163.html >>> >>> To deal with this in a crude way, at the moment I call Julia from a >>> shell (bash) script with timeout: >>> >>> timeout 600 julia -p 8 juliacode.jl >>> >>> When doing this, is there anything to help find and stop >>> zombie-processes (if any) after timeout forces a multicore pmap run to >>> terminate? Anything within Julia related to how the processes are spawned? >>> Any alternatives to shell timeout? I know NLopt has a time limit option but >>> that is not implemented within Julia (but in the underlying C-library). >>> >>> >
