Mike Dubman wrote:

Hello guys,


When executing following command with mtt and ompi 1.3.3:

mpirun --host witch15,witch15,witch15,witch15,witch16,witch16,witch16,witch16,witch17,witch17,witch17,witch17,witch18,witch18,witch18,witch18,witch19,witch19,witch19,witch19 -np 20   --mca btl_openib_use_srq 1  --mca btl self,sm,openib  ~mtt/mtt-scratch/20090809140816_dellix8_11812/installs/mnum/tests/ibm/ibm/dynamic/loop_spawn

getting following errors:

parent: MPI_Comm_spawn #0 return : 0
parent: MPI_Comm_spawn #20 return : 0
parent: MPI_Comm_spawn #40 return : 0
parent: MPI_Comm_spawn #60 return : 0
parent: MPI_Comm_spawn #80 return : 0
parent: MPI_Comm_spawn #100 return : 0
parent: MPI_Comm_spawn #120 return : 0
parent: MPI_Comm_spawn #140 return : 0
parent: MPI_Comm_spawn #160 return : 0
parent: MPI_Comm_spawn #180 return : 0
parent: MPI_Comm_spawn #200 return : 0
parent: MPI_Comm_spawn #220 return : 0
parent: MPI_Comm_spawn #240 return : 0
parent: MPI_Comm_spawn #260 return : 0
parent: MPI_Comm_spawn #280 return : 0
parent: MPI_Comm_spawn #300 return : 0
parent: MPI_Comm_spawn #320 return : 0
parent: MPI_Comm_spawn #340 return : 0
parent: MPI_Comm_spawn #360 return : 0
parent: MPI_Comm_spawn #380 return : 0
parent: MPI_Comm_spawn #400 return : 0
parent: MPI_Comm_spawn #420 return : 0
parent: MPI_Comm_spawn #440 return : 0
parent: MPI_Comm_spawn #460 return : 0
parent: MPI_Comm_spawn #480 return : 0
parent: MPI_Comm_spawn #500 return : 0
parent: MPI_Comm_spawn #520 return : 0
parent: MPI_Comm_spawn #540 return : 0
parent: MPI_Comm_spawn #560 return : 0
parent: MPI_Comm_spawn #580 return : 0
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it encountered an error:

Error: system limit exceeded on number of pipes that can be open
Node: witch19

when attempting to start process rank 0.

This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
asking the system administrator for that node to increase the system limit, or 
by rearranging your processes to place fewer of them on that node.


Do you know what OS params should be changed in order to resolve it?
I thought this error message just got a makeover.  So, if it's insufficient, it should probably be improved further.  The message suggests:

1) setting opal_set_max_sys_limits to 1, which seems pretty self explanatory

2) increasing descriptor limit using limit or ulimit, which requires a little more OS familiarity

3) cutting a deal with sysadmin

4) rearranging processes

So, which part are you asking about?  #2?  If so, try "man limit" and look at the places where you see anything about "descriptor[s]".  Answers depend on the shell you use.

Reply via email to