Hi MIke, Did you build the thread-safe version of the library? (i.e., configure with --enable-threadsafe)
Dana From: Hdf-forum [mailto:[email protected]] On Behalf Of Cook, Michael J (GE Healthcare) Sent: Friday, January 03, 2014 5:03 PM To: [email protected] Subject: [Hdf-forum] H5SL_insert_common(): can't insert duplicate key My HDF5 1.8.11 application is running on linux multi-core processor dual Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz and consists of multi-threaded process that at times can have two different threads reading the same hdf5 file for different reasons. Both threads read partial datasets. Both threads do separate H5Fopen on the file for read only and DIRECT_IO, and have their own returned file identifier. Both threads do H5Dread( dset, dtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, ptrForData ) with a typical data request size of ~32MB. Problem: Occasionally the two threads get in sync and perform H5Dread() on exactly the same partial dataset. When this happens, the application errors out with the following HDF5 error stack: HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0: #000: H5Dio.c line 182 in H5Dread(): can't read data major: Dataset minor: Read failed #001: H5Dio.c line 539 in H5D__read(): can't initialize I/O info major: Dataset minor: Unable to initialize object #002: H5Dchunk.c line 827 in H5D__chunk_io_init(): unable to create file chunk selections major: Dataset minor: Unable to initialize object #003: H5Dchunk.c line 1301 in H5D__create_chunk_file_map_hyper(): can't insert chunk into skip list major: Dataspace minor: Unable to insert object #004: H5SL.c line 989 in H5SL_insert(): can't create new skip list node major: Skip Lists minor: Unable to insert object #005: H5SL.c line 669 in H5SL_insert_common(): can't insert duplicate key major: Skip Lists minor: Unable to insert object Note that we have implemented a mutex to the hdf5 library calls to guarantee that no thread accesses the library calls simultaneously. Furthermore, our hdf5 wrapper class performs a set of related hdf5 library calls as an atomic sequence, for example to read a partial dataset, in context of one mutex get, the sequence of H5 calls to H5Dopen2, H5Dget_type, H5Dget_space, H5Sget_simple_extent_ndims, H5Sget_simple_extent_dims, H5Sselect_hyperslab, H5Screate_simple, H5Dread, ... where as entire sequence is performed. Our investigation suggests the error only occurs if the two threads happen to align with the exact same H5Dread request, which is serialized given our library mutex. The second identical H5Dread errors out as indicated in above error stack. The failure is very difficult to reproduce, suggesting a small timing window of opportunity. Can anyone share explanation of the error and/or possible ways to prevent? What is a H5 'skip list'? The H5SL.c H5SL_insert() header comment suggests: "COMMENTS, BUGS, ASSUMPTIONS Inserting an item with the same key as an existing object fails." Is this a known bug? Regards, Mike C.
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
