[julia-users] ANN / RFC: SGEArray iterator

David van Leeuwen Thu, 13 Mar 2014 01:08:29 -0700

Hello, 

I've got a tiny package that makes certain Sun Grid Engine array processing 
jobs easier with Julia.  I've put it up at


Pkg.clone("https://github.com/davidavdav/SGEArrays.jl.git";)

The premise is that your main Julia script needs to process a large number 
of files, which are given as a list.  

Rather than splitting the files in separate lists outside Julia, and 
spawning an array of jobs calling the Julia script with a different list 
files for every job, this splitting is done in an iterator.  

Your main julia script `bin/julia-script` could look like

using SGEArrays
listfile = ARGS[1]files = readdlm(listfile)
for file in SGEArray(files)
  ## process file $file end

i,e., the `SGEArray(files)` replaces the bit where you would normally have 
`files`.  Calling the script as an SGE task array of size 80 would go like:

find data/input/ -type f > filelist qsub -t1:80 -b y -cwd bin/julia-script 
filelist 

but the code would also work outside SGE

bin/julia-script filelist 

For certain computing tasks I find this somewhat easier than using 
ClusterManagers.addprocs_sge, which also doesn't seem to respect the 
current working directory in the worker processes.  

Cheers, 

---david

[julia-users] ANN / RFC: SGEArray iterator

Reply via email to