Hello,

I have mpiBLAST 1.4.0 installed on a 128 node cluster with SGE. While mpiBLAST worked fine for a while, all of a sudden I started getting errors with no apparent cause. I used the same submission script and the same parameters except that I increased number of processors from 2 to 16 (later when I dropped the number of CPUs, I kept getting the following output in the […].o<PID> file):

 

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.phr /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.phr

source = /ibrixfs1/users/mpiblast/yeast.nt.000.phr

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.phr

ret_value = 32512

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.pin /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.pin

source = /ibrixfs1/users/mpiblast/yeast.nt.000.pin

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.pin

ret_value = 32512

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.psq /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.psq

source = /ibrixfs1/users/mpiblast/yeast.nt.000.psq

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.psq

ret_value = 32512

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.pnd /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.pnd

source = /ibrixfs1/users/mpiblast/yeast.nt.000.pnd

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.pnd

ret_value = 32512

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.pni /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.pni

source = /ibrixfs1/users/mpiblast/yeast.nt.000.pni

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.pni

ret_value = 32512

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.psd /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.psd

source = /ibrixfs1/users/mpiblast/yeast.nt.000.psd

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.psd

ret_value = 32512

cp command failed!

command: cp /ibrixfs1/users/mpiblast/yeast.nt.000.psi /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.psi

source = /ibrixfs1/users/mpiblast/yeast.nt.000.psi

dest = /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nt.000.psi

ret_value = 32512

[9]     22870   (9) unable to copy fragment!

[9] [MPI Abort by user] Aborting Program!

Could not remove /ibrixfs1/users/mpiblast/runs/mpiblast_tempZyrpla/yeast.nth8EHTO.pal:: No such file or directory

Terminating processes..

done.

 

I tried changing ‘dest’ directory in the ncbirc file from /tmp to the network drive to try to avoid copying of the file but it did not help. I also tried deleting all file in the /tmp directory across all the nodes but that did not help either.

 

Any suggestions or insight would be greatly appreciated,

Enis

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to