Hi,

Something went wrong last night and all our MTT tests had the following output:
[odin005.cs.indiana.edu:28167] [[46567,0],0] ORTE_ERROR_LOG: Error in file
base/plm_base_launch_support.c at line 161
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error.
More information may be available above.
--------------------------------------------------------------------------

I have not tracked down what caused this, but the more immediate problem is that after giving this error mpirun returned '0' instead of a more sane error value.



Also, when running the test 'orte/test/mpi/abort' I get the error output:
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 17822 on
node odin013 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------

Which is wrong, it should be saying that the process was aborted. It looks like somehow the job state is being set to ORTE_JOB_STATE_ABORTED_WO_SYNC instead of ORTE_JOB_STATE_ABORTED.

Thanks,

Tim


Reply via email to