I am getting really awful performance using 'h5dump' to dump a scalar field as a binary file. It takes literally hours, whereas an h5copy -f ref takes just under 2 minutes, and a simple 'cp' is a little bit quicker than that.

My file header is reproduced below [1].  I am using

  h5dump -b LE -d /C00 -o outfile.raw infile.h5

to convert.  For comparison, 'h5copy' is run thusly:

  h5copy -s C00 -d C00 -i infile.h5 -o ./testing.h5 -v -f ref

While running, h5dump pegs a core at 98+% CPU usage.

I've tried attaching gdb to the process while it's running, so that I can obtain some poor-man's profiling. One popular stacktrace is appended below [2]. I guess it's locking and unlocking a mutex constantly? Other traces I have seen multiple times: H5I_object_verify called from H5Tequal; __pthread_setcancelstate from H5TS_cancel_count_inc from H5Tequal; __pthread_mutex_lock from H5TS_mutex_lock from H5open; H5T_cmp from H5Tequal (rarely).

If locking is indeed the problem, can I disable it at runtime somehow? These files are only being accessed by one process at a time, h5dump isn't even multithreaded anyway, and furthermore the access is purely read-only.

I am using HDF5 1.8.4. Please enlighten me as to how I can get reasonable performance out of these files.

Thanks,

-tom

[1]
$ h5dump -p -H TS_2011_12_26/TS_C00_0_16.h5
HDF5 "TS_C00_0_16.h5" {
GROUP "/" {
   DATASET "C00" {
      DATATYPE  H5T_STD_U16LE
      DATASPACE  SIMPLE { ( 301, 2550, 2550 ) / ( 301, 2550, 2550 ) }
      STORAGE_LAYOUT {
         CONTIGUOUS
         SIZE 3914505000
         OFFSET 1400
      }
      FILTERS {
         NONE
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE  0
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_LATE
      }
   }
}
}

[2]
(gdb) bt
#0  __pthread_mutex_lock (mutex=0x7ff2d1e7fac8) at pthread_mutex_lock.c:47
#1  0x00007ff2d1bf67d6 in H5TS_mutex_unlock () from /usr/lib/libhdf5.so.6
#2  0x00007ff2d19105b8 in H5open () from /usr/lib/libhdf5.so.6
#3  0x0000000000420501 in ?? ()
#4  0x000000000041fbf6 in ?? ()
#5  0x0000000000416e7f in ?? ()
#6  0x000000000041c856 in ?? ()
#7  0x000000000041cea9 in ?? ()
#8  0x000000000040abaf in ?? ()
#9  0x000000000040a20d in ?? ()
#10 0x000000000040d3c6 in ?? ()
#11 0x000000000040f387 in ?? ()
#12 0x00007ff2d156530d in __libc_start_main (main=0x40eae4, argc=8,
    ubp_av=0x7fffa529e648, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffa529e638) at libc-start.c:226
#13 0x0000000000405349 in ?? ()

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to