Hi Roger,

On Dec 7, 2010, at 2:06 PM, Roger Martin wrote:

> Further:
> 
> Debugging with MemoryScape:
> Reveals a segfault in H5SL.c (1.8.5) at line 1068
> ...1068....
>            H5SL_REMOVE(SCALAR, slist, x, const haddr_t, key, -) 
> //H5SL_TYPE_HADDR case
> ....
> 
> The stack trace is:
> H5SL_remove                     1068
> H5C_flush_single_entry      7993
> H5C_flush_cache                1395
> H5AC_flush                          941
> H5F_flush                           1673
> H5F_dest                              996
> H5F_try_close                     1900
> H5F_close                           1750
> H5I_dec_ref                         1490
> H5F_close                           1951
> 
> I'll be adding print outs to see what variable/pointer is causing the seg 
> fault.  The MemoryScape Fame shows:
> ..............
> Stack Frame
> Function "H5SL_remove":
>  slist:                       0x0b790fc0 (Allocated) -> (H5SL_t)
>  key:                         0x0b9853f8 (Allocated Interior) -> 
> 0x000000000001affc (110588)
> Block "$b8":
>  _last:                       0x0b772270 (Allocated) -> (H5SL_node_t)
>  _llast:                      0x0001affc -> (H5SL_node_t)
>  _next:                       0x0b9855c0 (Allocated) -> (H5SL_node_t)
>  _drop:                       0x0b772270 (Allocated) -> (H5SL_node_t)
>  _ldrop:                      0x0b772270 (Allocated) -> (H5SL_node_t)
>  _count:                      0x00000000 (0)
>  _i: <Bad address: 0x00000000>
> Local variables:
>  x: <Bad address: 0x00000000>
>  hashval: <Bad address: 0x00000000>
>  ret_value: <Bad address: 0x00000000>
>  FUNC:                        "H5SL_remove"
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> ................
> 
> Some bad addresses on some of the variables such as x which was set by "x = 
> slist->header;" which is a skip list.
> 
> These appear to be internal API functions and I'm wondering how I could be 
> offending them from high level API calls and file interfaces.  What could be 
> in the cache H5C when
> H5Fget_obj_count(fileID, H5F_OBJ_ALL) = 1
> and H5Fget_obj_count(fileID, H5F_OBJ_DATASET | H5F_OBJ_GROUP | 
> H5F_OBJ_DATATYPE | H5F_OBJ_ATTR) =0
> for the file the code is trying to close.

        Yes, you are correct, that shouldn't happen. :-/  Do you have a simple 
C program you can send to show this failure?

        Quincey

> On 12/03/2010 11:33 AM, Roger Martin wrote:
>> Hi,
>> 
>> Using hdf1.8.5 and 1.8.6 pre2; openmpi 1.4.3 on linux rhel4 and rhel5
>> 
>> 
>> In a case where the hdf5 operations aren't using MPI but build an h5 file 
>> exclusive to individual MPI jobs/processes:
>> 
>> The create:
>> currentFileID = H5Fcreate(filePath.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, 
>> H5P_DEFAULT);
>> 
>> and many file operations using the hl methods including packet table, tables 
>> and datasets etc. perform successfully.
>> 
>> Then near the individual processes' end the
>> H5Fclose(currentFileID);
>> is called but doesn't return.  A check for open objects says only one file 
>> object is open but no other objects(group, dataset etc).  No other software 
>> or process is acting on this h5; it is named exclusively for the one job it 
>> is associated with.
>> 
>> This isn't a parallel hdf5 in MPI attempt.  In another scenario parallel 
>> hdf5 is working the collective way just fine.  This current issue is for 
>> people who don't have or want a parallel file system and I made a coarsed 
>> grained MPI to run independent jobs for these folks.  Each job has its own 
>> h5 opened with H5Fcreate(filePath.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, 
>> H5P_DEFAULT);
>> 
>> Where should I look?
>> 
>> I'll try to make a small example test case for show and tell.
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>> 
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to