Further:
Debugging with MemoryScape:
Reveals a segfault in H5SL.c (1.8.5) at line 1068
...1068....
H5SL_REMOVE(SCALAR, slist, x, const haddr_t, key, -)
//H5SL_TYPE_HADDR case
....
The stack trace is:
H5SL_remove 1068
H5C_flush_single_entry 7993
H5C_flush_cache 1395
H5AC_flush 941
H5F_flush 1673
H5F_dest 996
H5F_try_close 1900
H5F_close 1750
H5I_dec_ref 1490
H5F_close 1951
I'll be adding print outs to see what variable/pointer is causing the
seg fault. The MemoryScape Fame shows:
..............
Stack Frame
Function "H5SL_remove":
slist: 0x0b790fc0 (Allocated) -> (H5SL_t)
key: 0x0b9853f8 (Allocated Interior) ->
0x000000000001affc (110588)
Block "$b8":
_last: 0x0b772270 (Allocated) -> (H5SL_node_t)
_llast: 0x0001affc -> (H5SL_node_t)
_next: 0x0b9855c0 (Allocated) -> (H5SL_node_t)
_drop: 0x0b772270 (Allocated) -> (H5SL_node_t)
_ldrop: 0x0b772270 (Allocated) -> (H5SL_node_t)
_count: 0x00000000 (0)
_i: <Bad address: 0x00000000>
Local variables:
x: <Bad address: 0x00000000>
hashval: <Bad address: 0x00000000>
ret_value: <Bad address: 0x00000000>
FUNC: "H5SL_remove"
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
................
Some bad addresses on some of the variables such as x which was set by
"x = slist->header;" which is a skip list.
These appear to be internal API functions and I'm wondering how I could
be offending them from high level API calls and file interfaces. What
could be in the cache H5C when
H5Fget_obj_count(fileID, H5F_OBJ_ALL) = 1
and H5Fget_obj_count(fileID, H5F_OBJ_DATASET | H5F_OBJ_GROUP |
H5F_OBJ_DATATYPE | H5F_OBJ_ATTR) =0
for the file the code is trying to close.
On 12/03/2010 11:33 AM, Roger Martin wrote:
Hi,
Using hdf1.8.5 and 1.8.6 pre2; openmpi 1.4.3 on linux rhel4 and rhel5
In a case where the hdf5 operations aren't using MPI but build an h5
file exclusive to individual MPI jobs/processes:
The create:
currentFileID = H5Fcreate(filePath.c_str(), H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
and many file operations using the hl methods including packet table,
tables and datasets etc. perform successfully.
Then near the individual processes' end the
H5Fclose(currentFileID);
is called but doesn't return. A check for open objects says only one
file object is open but no other objects(group, dataset etc). No
other software or process is acting on this h5; it is named
exclusively for the one job it is associated with.
This isn't a parallel hdf5 in MPI attempt. In another scenario
parallel hdf5 is working the collective way just fine. This current
issue is for people who don't have or want a parallel file system and
I made a coarsed grained MPI to run independent jobs for these folks.
Each job has its own h5 opened with H5Fcreate(filePath.c_str(),
H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
Where should I look?
I'll try to make a small example test case for show and tell.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org