Dear Samuel, Just as you replied I was trying that on the compute nodes. Surprise, surprise...the value returned as the hard and soft limits is 1024.
Thanks for confirming my suspicions... Regards, Tim. On Mar 30, 2011, at 7:41 PM, Samuel K. Gutierrez wrote: Hi, It sounds like Open MPI is hitting your system's open file descriptor limit. If that's the case, one potential workaround is to have your system administrator raise file descriptor limits. On a compute node, what does "ulimit -a" show (using bash)? Hope that helps, -- Samuel K. Gutierrez Los Alamos National Laboratory On Mar 30, 2011, at 5:22 PM, Timothy Stitt wrote: Dear OpenMPI developers, One of our users was running a benchmark on a 1032 core simulation. He had a successful run at 900 cores but when he stepped up to 1032 cores the job just stalled and his logs contained many occurrences of the following line: [d6copt368.crc.nd.edu][[25621,1],0][btl_tcp_component.c:885:mca_btl_tcp_component_accept_handler] accept() failed: Too many open files (24) The simulation has a single master task that communicates with all the other tasks to write out some I/O via the master. We are assuming the message is related to this bottleneck. Is there a 1024 limit on the number of open files/connections for instance? Can anyone confirm the meaning of this error and secondly provide a resolution that hopefully doesn't involve a code rewrite. Thanks in advance, Tim. Tim Stitt PhD (User Support Manager). Center for Research Computing | University of Notre Dame | P.O. Box 539, Notre Dame, IN 46556 | Phone: 574-631-5287 | Email: tst...@nd.edu<mailto:tst...@nd.edu> _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/devel <ATT00001..txt> Tim Stitt PhD (User Support Manager). Center for Research Computing | University of Notre Dame | P.O. Box 539, Notre Dame, IN 46556 | Phone: 574-631-5287 | Email: tst...@nd.edu<mailto:tst...@nd.edu>