Mike Dubman wrote:
Hello guys,
When executing following command with mtt and ompi 1.3.3:
mpirun --host witch15,witch15,witch15,witch15,witch16,witch16,witch16,witch16,witch17,witch17,witch17,witch17,witch18,witch18,witch18,witch18,witch19,witch19,witch19,witch19 -np 20 --mca btl_openib_use_srq 1 --mca btl self,sm,openib
~mtt/mtt-scratch/20090809140816_dellix8_11812/installs/mnum/tests/ibm/ibm/dynamic/loop_spawn
getting following errors:
parent: MPI_Comm_spawn #0 return : 0
parent: MPI_Comm_spawn #20 return : 0
parent: MPI_Comm_spawn #40 return : 0
parent: MPI_Comm_spawn #60 return : 0
parent: MPI_Comm_spawn #80 return : 0
parent: MPI_Comm_spawn #100 return : 0
parent: MPI_Comm_spawn #120 return : 0
parent: MPI_Comm_spawn #140 return : 0
parent: MPI_Comm_spawn #160 return : 0
parent: MPI_Comm_spawn #180 return : 0
parent: MPI_Comm_spawn #200 return : 0
parent: MPI_Comm_spawn #220 return : 0
parent: MPI_Comm_spawn #240 return : 0
parent: MPI_Comm_spawn #260 return : 0
parent: MPI_Comm_spawn #280 return : 0
parent: MPI_Comm_spawn #300 return : 0
parent: MPI_Comm_spawn #320 return : 0
parent: MPI_Comm_spawn #340 return : 0
parent: MPI_Comm_spawn #360 return : 0
parent: MPI_Comm_spawn #380 return : 0
parent: MPI_Comm_spawn #400 return : 0
parent: MPI_Comm_spawn #420 return : 0
parent: MPI_Comm_spawn #440 return : 0
parent: MPI_Comm_spawn #460 return : 0
parent: MPI_Comm_spawn #480 return : 0
parent: MPI_Comm_spawn #500 return : 0
parent: MPI_Comm_spawn #520 return : 0
parent: MPI_Comm_spawn #540 return : 0
parent: MPI_Comm_spawn #560 return : 0
parent: MPI_Comm_spawn #580 return : 0
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it encountered an error:
Error: system limit exceeded on number of pipes that can be open
Node: witch19
when attempting to start process rank 0.
This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1,
increasing your limit descriptor setting (using limit or ulimit commands),
asking the system administrator for that node to increase the system limit, or
by rearranging your processes to place fewer of them on that node.
Do you know what OS params should be changed in order to resolve it?
I thought this error message just got a makeover. So, if it's
insufficient, it should probably be improved further. The message
suggests:
1) setting opal_set_max_sys_limits to 1, which seems pretty self
explanatory
2) increasing descriptor limit using limit or ulimit, which requires a
little more OS familiarity
3) cutting a deal with sysadmin
4) rearranging processes
So, which part are you asking about? #2? If so, try "man limit" and
look at the places where you see anything about "descriptor[s]".
Answers depend on the shell you use.
|