I have a problem to finish the porting of ROMIO into Open MPI. It is related to the routines MPI_Comm_dup together with MPI_Keyval_create, MPI_Keyval_free, MPI_Attr_get and MPI_Attr_put.

Here is a simple program that reproduces my problem:

===========================================
#include <stdio.h>
#include "mpi.h"

int copy_fct(MPI_Comm comm, int keyval, void *extra, void *attr_in, void **attr_out, int *flag) {
   return MPI_SUCCESS;
}

int delete_fct(MPI_Comm comm, int keyval, void *attr_val, void *extra) {
   MPI_Keyval_free(&keyval);
   return MPI_SUCCESS;
}

int main(int argc, char **argv) {
   int i, found, attribute_val=100, keyval = MPI_KEYVAL_INVALID;
   MPI_Comm dupcomm;

   MPI_Init(&argc,&argv);

   for (i=0; i<100;i++) {
       /* This simulates the MPI_File_open() */
       if (keyval == MPI_KEYVAL_INVALID) {
MPI_Keyval_create((MPI_Copy_function *) copy_fct, (MPI_Delete_function *) delete_fct, &keyval, NULL);
               MPI_Attr_put(MPI_COMM_WORLD, keyval, &attribute_val);
               MPI_Comm_dup(MPI_COMM_WORLD, &dupcomm);
       }
       else {
               MPI_Comm_dup(MPI_COMM_WORLD, &dupcomm);
MPI_Attr_get(MPI_COMM_WORLD, keyval, (void *) &attribute_val, &found);
       }
       /* This simulates the MPI_File_close() */
       MPI_Comm_free(&dupcomm);
   }
   MPI_Finalize();
===============================================
I run it on only one process and get the error:
*** An error occurred in MPI_Attr_get
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_OTHER: known error not in list
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

I think this error is displayed because  keyval does not exist any more.

This programm runs well on MPICH2 (ROMIO is comming with MPICH2).
This programm runs well when delete_fct() does not call MPI_Keyval_free
This programm runs well when I call MPI_Keyval_create with "MPI_NULL_COPY_FN" instead of "(MPI_Copy_function *) copy_fct" (this is quite strange : copy_fct does nothing !).

I suspect that there could be a bug in OpenMPI: In ompi/attribute/attribute.c two functions are calling OBJ_RELEASE: ompi_attr_delete and ompi_attr_free_keyval. So, the
reference count is decremented twice.

Pascal


Reply via email to