We are working towards thread safety, but nowhere near ready yet. On May 6, 2013, at 3:39 AM, Hugo Daniel Meyer <meyer.h...@gmail.com> wrote:
> Sorry, i've sent the message without finishing it. > > Hello to @ll. > > I'm not sure if this is the correct list to post this question, but maybe i'm > dealing with a bug. > > I have develop an event logging mechanism where application processes connect > to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, > MPI_Comm_Accept, etc) that are part of another MPI application. > > Well, i have develop my own vprotocol where once a process receive a message > try to establish a connection with an event logger which is a thread that > belongs to another mpi application. > > The event logger application consists in one mpi process per node with > multiple threads waiting for connections of MPI processes from the main > application. > > I'm suspecting that there is a problem with the critical regions when > processes try to connect with the threads of the event logger. > > I'm attaching two short examples that i have written in order to show the > problem. First, i launch the event-logger application: > > mpirun -n 2 --machinefile machinefile2-th --report-uri URIFILE ./test-thread > > Then i launch the example as this: > > mpirun -n 16 --machinefile machine16 --ompi-server file:URIFILE > ./thread_logger_connect > > I have obtained this output: > > Published: radic_eventlog[1,6], ret=0 > [clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file dpm_orte.c at line 315 > [clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end > of buffer in file dpm_orte.c at line 315 > [clus2:16104] *** An error occurred in MPI_Comm_accept > [clus2:16104] *** on communicator MPI_COMM_SELF > [clus2:16104] *** MPI_ERR_UNKNOWN: unknown error > [clus2:16104] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort) > -------------------------------------------------------------------------- > mpirun has exited due to process rank 1 with PID 16104 on > node clus2 exiting improperly. There are two reasons this could occur: > > 1. this process did not call "init" before exiting, but others in > the job did. This can cause a job to hang indefinitely while it waits > for all processes to call "init". By rule, if one process calls "init", > then ALL processes must call "init" prior to termination. > > 2. this process called "init", but exited without calling "finalize". > By rule, all processes that call "init" MUST call "finalize" prior to > exiting or it will be considered an "abnormal termination" > > This may have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > > > If i use mutex in order to serialized the access to MPI_Comm_Accept, the > behavior is ok, but shoudn't the MPI_comm_accept be thread safe? > > Best regards. > > Hugo Meyer > > P.d.: This occurs with openmpi1.5.1 and also with also with an old version of > the trunk (1.7). > > > 2013/5/6 Hugo Daniel Meyer <meyer.h...@gmail.com> > Hello to @ll. > > I'm not sure if this is the correct list to post this question, but maybe i'm > dealing with a bug. > > I have develop an event logging mechanism where application processes connect > to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, > MPI_Comm_Accept, etc) that are part of another MPI application. > > Well, i have develop my own vprotocol where once a process receive a message > try to establish a connection with an event logger which is a thread that > belongs to another mpi application. > > The event logger application consists in one mpi process per node with > multiple threads waiting for connections of MPI processes from the main > application. > > I'm suspecting that there is a problem with the critical regions when > processes try to connect with the threads of the event logger. > > I'm attaching two short examples that i have written in order to show the > problem. First, i launch the event-logger application: > > > > If i use mutex in order to serialized the access to MPI_Comm_Accept, > > > > > <event_logger.c><main-mpi-app.c>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel