We are working towards thread safety, but nowhere near ready yet. 

On May 6, 2013, at 3:39 AM, Hugo Daniel Meyer <meyer.h...@gmail.com> wrote:

> Sorry, i've sent the message without finishing it.
> 
> Hello to @ll.
> 
> I'm not sure if this is the correct list to post this question, but maybe i'm 
> dealing with a bug.
> 
> I have develop an event logging mechanism where application processes connect 
> to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, 
> MPI_Comm_Accept, etc) that are part of another MPI application.
> 
> Well, i have develop my own vprotocol where once a process receive a message 
> try to establish a connection with an event logger which is a thread that 
> belongs to another mpi application. 
> 
> The event logger application consists in one mpi process per node with 
> multiple threads waiting for connections of MPI processes from the main 
> application. 
> 
> I'm suspecting that there is a problem with the critical regions when 
> processes try to connect with the threads of the event logger. 
> 
> I'm attaching two short examples that i have written in order to show the 
> problem. First, i launch the event-logger application:
> 
> mpirun -n 2 --machinefile machinefile2-th --report-uri URIFILE ./test-thread
> 
> Then i launch the example as this:
> 
> mpirun -n 16 --machinefile machine16 --ompi-server file:URIFILE 
> ./thread_logger_connect
> 
> I have obtained this output:
> 
> Published: radic_eventlog[1,6], ret=0
> [clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file dpm_orte.c at line 315
> [clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file dpm_orte.c at line 315
> [clus2:16104] *** An error occurred in MPI_Comm_accept
> [clus2:16104] *** on communicator MPI_COMM_SELF
> [clus2:16104] *** MPI_ERR_UNKNOWN: unknown error
> [clus2:16104] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 16104 on
> node clus2 exiting improperly. There are two reasons this could occur:
> 
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
> 
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
> 
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> 
> 
> If i use mutex in order to serialized the access to MPI_Comm_Accept, the 
> behavior is ok, but shoudn't the MPI_comm_accept be thread safe?
> 
> Best regards.
> 
> Hugo Meyer
> 
> P.d.: This occurs with openmpi1.5.1 and also with also with an old version of 
> the trunk (1.7).
> 
> 
> 2013/5/6 Hugo Daniel Meyer <meyer.h...@gmail.com>
> Hello to @ll.
> 
> I'm not sure if this is the correct list to post this question, but maybe i'm 
> dealing with a bug.
> 
> I have develop an event logging mechanism where application processes connect 
> to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, 
> MPI_Comm_Accept, etc) that are part of another MPI application.
> 
> Well, i have develop my own vprotocol where once a process receive a message 
> try to establish a connection with an event logger which is a thread that 
> belongs to another mpi application. 
> 
> The event logger application consists in one mpi process per node with 
> multiple threads waiting for connections of MPI processes from the main 
> application. 
> 
> I'm suspecting that there is a problem with the critical regions when 
> processes try to connect with the threads of the event logger. 
> 
> I'm attaching two short examples that i have written in order to show the 
> problem. First, i launch the event-logger application:
> 
> 
> 
> If i use mutex in order to serialized the access to MPI_Comm_Accept,
> 
> 
> 
> 
> <event_logger.c><main-mpi-app.c>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to