I've run parallel julia on a Torque cluster with Infiniband. I start an
interactive session with qsub -I,
look for allocated nodes in $PBS_NODEFILE, convert to IB interface names,
and addprocs.
filestream = open(ENV["PBS_NODEFILE"])
seekstart(filestream)
linearray = readlines(filestream)
strippedarray = similar(linearray)
for i in 1:length(linearray)
strippedarray[i] = strip(linearray[i]) * "-ipoib.ipoib"
end
for i in 1:length(strippedarray)
singlearray = [strip(strippedarray[i])]
addprocs(singlearray)
end
print(workers())
To start an interactive job, depending on your node configuration and queue
names:
qsub -I -l nodes=2:ppn=32,walltime=00:30:00 -q normal
When you get your nodes, start julia with the above setup file with:
julia --load setupfilename
This should addprocs then give you the julia prompt.
But it looks like something is wrong with your modules?
On Friday, April 25, 2014 5:09:57 AM UTC-7, Isaac wrote:
>
> Hi All,
>
> I also tried to submit the julia jobs on the cluster but failed. I wrote
> the job script as follows:
> f
>
>
>
>
>
>
>
>
>
>
> *or((i = 1; i < 10; i++))doecho "# cd /data#PBS -l walltime=00:10:00module
> add gcc/4.7.2module add julia/0.2.0module load
> juliainclude("test.jl")test($i)">test1job$i;qsub test1job$i;done*
> I got the errors:
> julia/0.2.0(16):ERROR:151: Module 'julia/0.2.0' depends on one of the
> module(s) 'gcc/4.7.2'
> julia/0.2.0(16):ERROR:102: Tcl command execution failed: prereq gcc/4.7.2
>
> /cm/local/apps/torque/current/mom_priv/jobs/1053.cluster.SC: line 7:
> syntax error near unexpected token `a0d0.jl'
> /cm/local/apps/torque/current/mom_priv/jobs/1053.cluster.SC: line 7:
> `include(a0d0.jl)'
>
> Does anybody know how to write the job script to submit julia job on a
> cluster? Could you give an example?
> Thanks in advance.
>
> Isaac
>
>
>
>
>