I've run parallel julia on a Torque cluster with Infiniband.  I start an 
interactive session with qsub -I,
look for allocated nodes in $PBS_NODEFILE, convert to IB interface names, 
and addprocs.

filestream = open(ENV["PBS_NODEFILE"])
seekstart(filestream)
linearray = readlines(filestream)
strippedarray = similar(linearray)
for i in 1:length(linearray)
    strippedarray[i] = strip(linearray[i]) * "-ipoib.ipoib"
    end
for i in 1:length(strippedarray)
     singlearray = [strip(strippedarray[i])]
     addprocs(singlearray)
    end
print(workers())

To start an interactive job, depending on your node configuration and queue 
names:
qsub -I -l nodes=2:ppn=32,walltime=00:30:00 -q normal

When you get your nodes, start julia with the above setup file with:
julia --load setupfilename

This should addprocs then give you the julia prompt.

But it looks like something is wrong with your modules?

On Friday, April 25, 2014 5:09:57 AM UTC-7, Isaac wrote:
>
> Hi All,
>  
>  I also tried to submit the julia jobs on the cluster but failed. I wrote 
> the job script as follows:
> f
>
>
>
>
>
>
>
>
>
>
> *or((i = 1; i < 10; i++))doecho "# cd /data#PBS -l walltime=00:10:00module 
> add gcc/4.7.2module add julia/0.2.0module load 
> juliainclude("test.jl")test($i)">test1job$i;qsub test1job$i;done*
> I got the errors: 
> julia/0.2.0(16):ERROR:151: Module 'julia/0.2.0' depends on one of the 
> module(s) 'gcc/4.7.2'
> julia/0.2.0(16):ERROR:102: Tcl command execution failed: prereq gcc/4.7.2
>
> /cm/local/apps/torque/current/mom_priv/jobs/1053.cluster.SC: line 7: 
> syntax error near unexpected token `a0d0.jl'
> /cm/local/apps/torque/current/mom_priv/jobs/1053.cluster.SC: line 7: 
> `include(a0d0.jl)'
>
> Does anybody know how to write the job script to submit julia job on a 
> cluster? Could you give an example?
> Thanks in advance.
>
> Isaac
>
>  
>
>
>

Reply via email to