Interesting! I was running it on odin last night until around 11pm your time without problems.
I'll take a look.... On 3/25/08 6:35 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote: > Hi, > > Something went wrong last night and all our MTT tests had the following > output: > [odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file > base/plm_base_launch_support.c at line 161 > -------------------------------------------------------------------------- > mpirun was unable to start the specified application as it encountered > an error. > More information may be available above. > -------------------------------------------------------------------------- > > I have not tracked down what caused this, but the more immediate problem > is that after giving this error mpirun returned '0' instead of a more > sane error value. > > > > Also, when running the test 'orte/test/mpi/abort' I get the error output: > -------------------------------------------------------------------------- > mpirun has exited due to process rank 1 with PID 17822 on > node odin013 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > > Which is wrong, it should be saying that the process was aborted. It > looks like somehow the job state is being set to > ORTE_JOB_STATE_ABORTED_WO_SYNC instead of ORTE_JOB_STATE_ABORTED. > > Thanks, > > Tim > >