Hello,
I've got a tiny package that makes certain Sun Grid Engine array processing
jobs easier with Julia. I've put it up at
Pkg.clone("https://github.com/davidavdav/SGEArrays.jl.git")
The premise is that your main Julia script needs to process a large number
of files, which are given as a list.
Rather than splitting the files in separate lists outside Julia, and
spawning an array of jobs calling the Julia script with a different list
files for every job, this splitting is done in an iterator.
Your main julia script `bin/julia-script` could look like
using SGEArrays
listfile = ARGS[1]files = readdlm(listfile)
for file in SGEArray(files)
## process file $file end
i,e., the `SGEArray(files)` replaces the bit where you would normally have
`files`. Calling the script as an SGE task array of size 80 would go like:
find data/input/ -type f > filelist qsub -t1:80 -b y -cwd bin/julia-script
filelist
but the code would also work outside SGE
bin/julia-script filelist
For certain computing tasks I find this somewhat easier than using
ClusterManagers.addprocs_sge, which also doesn't seem to respect the
current working directory in the worker processes.
Cheers,
---david