One thing stands out right away: why are you specifying a hostfile? Did you remember to configure OMPI with --with-tm so we launch via Torque? If not, then you could hit issues as you are actually attempting to launch via ssh, which has implications on a Torque-based system.
On Mar 29, 2012, at 8:51 AM, Raju wrote: > Hi Team, > > I am using Qlogic Infiniband and Openmpi-1.5.3. I can able to run the jobs by > CLI without any issues, but when iam submitting over torque scheduler facing > the below issue. > > I am facing issue while submitting the jobs through Torque scheduler. Error > file is attached > > Overview of the problem: > > node1.ibab.ac.in.5910Driver initialization failure on /dev/ipath (err=23) > -------------------------------------------------------------------------- > PSM was unable to open an endpoint. Please make sure that the network link is > active on the node and the hardware is functioning. > > Error: Failure in initializing endpoint > > I gone through the link > http://www.open-mpi.org/community/lists/users/2011/12/17888.php for solution, > same followed but no luck. > > I exported the value in my input submit script file as export > PSM_SHAREDCONTEXTS_MAX=16, and submitted the job. > > Sample inputfile is > > #!/bin/bash > #PBS -N matmul > #PBS -l nodes=1:ppn=1 > node=1 > ppn=1 > nprocs=`expr ${node} \* ${ppn}` > echo "--- PBS_NODEFILE CONTENT ---" > cat $PBS_NODEFILE > export PSM_SHAREDCONTEXTS_MAX=16 > > mpirun -np ${nprocs} --hostfile $PBS_NODEFILE /home/khan/a.out < > /home/khan/iter > > Please let me know I doing correct or not ? and suggest me for best out ? > > Regards, > Bhagya Raju K > <errfile.txt>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel