I wrote a similar package long ago for Python and remember SGE array jobs well.
If ClusterManager's addprocs_sge function doesn't respect the current working directory in the worker processes, it would be nice to file an issue about it. It would be really nice to have your code integrated more tightly with ClusterManager rather than exist as a separate package. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory On Thu, Mar 13, 2014 at 4:07 AM, David van Leeuwen <[email protected]> wrote: > Hello, > > I've got a tiny package that makes certain Sun Grid Engine array processing > jobs easier with Julia. I've put it up at > > Pkg.clone("https://github.com/davidavdav/SGEArrays.jl.git") > > The premise is that your main Julia script needs to process a large number > of files, which are given as a list. > > Rather than splitting the files in separate lists outside Julia, and > spawning an array of jobs calling the Julia script with a different list > files for every job, this splitting is done in an iterator. > > Your main julia script `bin/julia-script` could look like > > using SGEArrays > > listfile = ARGS[1] > files = readdlm(listfile) > > for file in SGEArray(files) > ## process file $file > end > > i,e., the `SGEArray(files)` replaces the bit where you would normally have > `files`. Calling the script as an SGE task array of size 80 would go like: > > find data/input/ -type f > filelist > qsub -t1:80 -b y -cwd bin/julia-script filelist > > but the code would also work outside SGE > > bin/julia-script filelist > > For certain computing tasks I find this somewhat easier than using > ClusterManagers.addprocs_sge, which also doesn't seem to respect the current > working directory in the worker processes. > > Cheers, > > ---david
