Hi, I have had sporadic crashes with parallel HDF5, and when I checked my code with valgrind it seems that the crash is due to a bug in H5Smpio.c. I am using hdf5 version 1.8.8.
In routine H5S_obtain_datatype, starting near line 568 of H5Smpio.c, memory is
being realloced if larger buffers are necessary:
/* Check if we need to increase the size of the buffers */
if(outercount >= alloc_count) {
MPI_Aint *tmp_disp; /* Temporary pointer to new displacement
buffer */
int *tmp_blocklen; /* Temporary pointer to new block length
buffer */
MPI_Datatype *tmp_inner_type; /* Temporary pointer to inner MPI datatype
buffer */
/* Double the allocation count */
alloc_count *= 2;
/* Re-allocate the buffers */
if(NULL == (tmp_disp = (MPI_Aint *)H5MM_realloc(disp, alloc_count *
sizeof(MPI_Aint))))
HGOTO_ERROR(H5E_DATASPACE, H5E_CANTALLOC, FAIL, "can't allocate array
of displacements")
disp = tmp_disp;
if(NULL == (tmp_blocklen = (int *)H5MM_realloc(blocklen, alloc_count *
sizeof(int))))
HGOTO_ERROR(H5E_DATASPACE, H5E_CANTALLOC, FAIL, "can't allocate array
of block lengths")
blocklen = tmp_blocklen;
if(NULL == (tmp_inner_type = (MPI_Datatype *)H5MM_realloc(inner_type,
alloc_count * sizeof(MPI_Datatype))))
HGOTO_ERROR(H5E_DATASPACE, H5E_CANTALLOC, FAIL, "can't allocate array
of inner MPI datatypes")
} /* end if */
However, unlike with the "disp" and "blocklen" buffers, the inner_type is
never pointed to the new tmp_inner_type buffer!! So now inner_type has been
freed and doesn't point to anything, and the realloced memory is leaked and
will never be freed.
The fix is to just add a line:
inner_type = tmp_inner_type;
after the call to H5MM_realloc as for the "disp" and "blocklen" buffers. I
have attached a patch for this. With this fix, parallel hdf5 works very well
for me, but without the fix I get many crashes. I hope this can be fixed for
the 1.8.9 release,
Martin J. Otte
Atmospheric Modeling and Analysis Division
U.S. Environmental Protection Agency
109 T.W. Alexander Drive, Mail Drop E243-03
Research Triangle Park, NC 27711 USA
Fax: 919-541-1379
Voice: 919-541-0147
hdf5-H5Smpio_realloc.patch
Description: Binary data
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
