Interesting! I was running it on odin last night until around 11pm your time
without problems.

I'll take a look....


On 3/25/08 6:35 AM, "Tim Prins" <tpr...@cs.indiana.edu> wrote:

> Hi,
> 
> Something went wrong last night and all our MTT tests had the following
> output:
> [odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file
> base/plm_base_launch_support.c at line 161
> --------------------------------------------------------------------------
> mpirun was unable to start the specified application as it encountered
> an error.
> More information may be available above.
> --------------------------------------------------------------------------
> 
> I have not tracked down what caused this, but the more immediate problem
> is that after giving this error mpirun returned '0' instead of a more
> sane error value.
> 
> 
> 
> Also, when running the test 'orte/test/mpi/abort' I get the error output:
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 17822 on
> node odin013 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> 
> Which is wrong, it should be saying that the process was aborted. It
> looks like somehow the job state is being set to
> ORTE_JOB_STATE_ABORTED_WO_SYNC  instead of ORTE_JOB_STATE_ABORTED.
> 
> Thanks,
> 
> Tim
> 
> 


Reply via email to