Dear all.

Thanks to developers of OPEN-MPI for Fault-Tolerance, I can use the 
checkpoint/restart function very well for my MPI applications.  
But its checkpoint does not work for my GASNet applications which use the MPI 
conduit.
Is here anyone else to help me?

I wrote some code with GASNet API (Global-Address Space Networking: 
http://gasnet.cs.berkeley.edu/)    and used MPI conduit for my gasnet 
application, so my program ran well with open-mpirun. Thus I thought that I 
could also use the transparent checkpoint/restart function supported by BLCR in 
Open-mpi. As opposed to my idea, it does not work and show the following error 
message.
--------------------------------------------------------------------------
Error: The process with PID 13896 is not checkpointable.
       This could be due to one of the following:
        - An application with this PID doesn't currently exist 
        - The application with this PID isn't checkpointable  
        - The application with this PID isn't an OPAL application.
       We were looking for the named files:
         /tmp/opal_cr_prog_write.13896
         /tmp/opal_cr_prog_read.13896
--------------------------------------------------------------------------
1 more process has sent help message help-opal-checkpoint.txt
Set MCA parameter "orte_base_help_aggregate" to 0 to see all help 
 0] 13896) Step 53
 0] 15100) Step 53
 0] 13896) Step 54
 0] 15100) Step 54
 0] 13896) Step 55

In my application, the MPI_Initialized() says it is initialized.

Thank you for your reading and have a great day.

Reply via email to