Dear DMTCP team,

I am trying to use DMTCP for MPI applications that use MPI_THREAD_SERIALIZED 
mode. As you know, this requires initializing the application using 
MPI_Init_thread, instead of MPI_Init.  Unfortunately, not a single time 
dmtcp_launch succeeded for these applications. The error I got is just "Bus 

Replacing MPI_Init_thread with MPI_Init (which uses MPI_THREAD_SINGLE) fixes 
the problem, however, this is not a practical mode for my applications. I have 

My configurations:

- DMTCP 2.4.5

- OpenMPI-ULFM (which is based on OpenMPI 1.7)

- Network file system used.

dmtcp_launch trace:

[5209] TRACE at dmtcp_launch.cpp:442 in main; REASON='dmtcp_launch starting new 
     argv[0] = mpirun
[5209] TRACE at coordinatorapi.cpp:566 in connectToCoordOnStartup; 
REASON='sending coordinator handshake'
     UniquePid::ThisProcess() = 216034594ce6503-5209-5800a2ed
[5209] TRACE at coordinatorapi.cpp:573 in connectToCoordOnStartup; REASON='Got 
virtual pid from coordinator'
     hello_remote.virtualPid = 281000
[5209] TRACE at shareddata.cpp:193 in initialize; REASON='Shared area mapped'
     sharedDataHeader = 0x7ff19f819000
[5209] TRACE at dmtcp_launch.cpp:771 in setLDPreloadLibs; REASON='getting value 
     getenv("LD_PRELOAD") = <removed>
     preloadLibs =  <removed>
     preloadLibs32 =
[281000] TRACE at shareddata.cpp:193 in initialize; REASON='Shared area mapped'
     sharedDataHeader = 0x7f7a5984d000
[281000] TRACE at dmtcpworker.cpp:260 in 
prepareLogAndProcessdDataFromSerialFile; REASON='Root of processes tree'
[281000] TRACE at dmtcpworker.cpp:315 in DmtcpWorker; REASON='  
Running '
     jalib::Filesystem::GetProgramName() = orterun
     getenv ("LD_PRELOAD") = <removed>
[281000] TRACE at dmtcpworker.cpp:111 in restoreUserLDPRELOAD; 
     preload =
     userPreload = [281000] TRACE at coordinatorapi.cpp:127 in init; 
REASON='Informing coordinator of new process'
     UniquePid::ThisProcess() = 216034594ce6503-281000-5800a2ed
[281000] TRACE at processinfo.cpp:180 in growStack; REASON='Original stack area'
     (void*)area.addr = 0x7fff33fdf000
     area.size = 90112
[281000] TRACE at processinfo.cpp:218 in growStack; REASON='New stack size'
     (void*)area.addr = 0x7fff30004000
     area.size = 67047424
[281000] TRACE at fileconnlist.cpp:385 in scanForPreExisting; REASON='scanning 
pre-existing device'
     fd = 0
     device = /dev/pts/18
Bus error

dmtcp_coordinator trace:

[26364] TRACE at dmtcp_coordinator.cpp:962 in onConnect; REASON='accepting new 

     remote.sockfd() = 5

     (strerror((*__errno_location ()))) = No such file or directory

[26364] TRACE at dmtcp_coordinator.cpp:971 in onConnect; REASON='Reading from 
incoming connection...'

[26364] TRACE at dmtcp_coordinator.cpp:1263 in validateNewWorkerProcess; 
REASON='First process connected.  Creating new computation group.'

     compId = 216034594ce6503-281000-5800a2ed

[26364] NOTE at dmtcp_coordinator.cpp:1079 in onConnect; REASON='worker 

     hello_remote.from = 216034594ce6503-5209-5800a2ed

[26364] TRACE at dmtcp_coordinator.cpp:1084 in onConnect; REASON='END'

     clients.size() = 1

[26364] NOTE at dmtcp_coordinator.cpp:867 in onData; REASON='Updating process 
Information after exec()'

     progname = orterun

     msg.from = 216034594ce6503-281000-5800a2ed

     client->identity() = 216034594ce6503-5209-5800a2ed

[26364] NOTE at dmtcp_coordinator.cpp:917 in onDisconnect; REASON='client 

     client->identity() = 216034594ce6503-281000-5800a2ed

     client->progname() = orterun

[26364] TRACE at dmtcp_coordinator.cpp:892 in removeStaleSharedAreaFile; 
REASON='Removing sharedArea file.'

     o.str() = <tmp dir 

I correlated the "Bus Error" with MPI_Thread_Init, but this is not necessarily 
true. I hope the above log helps you identify the root cause of this error.

Best Regards,

Sara S. Hamouda
PhD Candidate (Computer Systems Group)
College of Engineering and Computer Science
The Australian National University
Check out the vibrant tech community on one of the world's most 
engaging tech sites,!
Dmtcp-forum mailing list

Reply via email to