Hi Quincey, Thanks for your reply.
This helped considerably. I can dump one of my files in 16.5 minutes now, down from the 4+ hours it took before. However, this is still the slowest part of my pipeline. Another order of magnitude improvement would be welcome, of course ;), but I'd be really happy if we could just halve my current runtime for it. Any other ideas?
Secondly, my previous HDF5 was simply installed as part of Ubuntu. I imagine pre-installed HDF5 versions are common for many users. Could I request that "no thread safety" be made a runtime option, which the command line tools could set implicitly? As shown, it provides a huge performance benefit, and is significantly easier to use, then, because users won't need to compile their own HDF5.
Thanks, -tom On 04/25/2012 05:53 PM, Quincey Koziol wrote:
Hi Tom, Looks like you are working with a thread-safe build of HDF5, which is unnecessary for the command-line tools. You could rebuild the HDF5 distribution (I would suggest moving up to 1.8.8 or the 1.8.9 prerelease) without the thread-safe configure flag, and that should get rid of the mutex issues. Quincey On Apr 25, 2012, at 9:28 AM, tom fogal wrote:I am getting really awful performance using 'h5dump' to dump a scalar field as a binary file. It takes literally hours, whereas an h5copy -f ref takes just under 2 minutes, and a simple 'cp' is a little bit quicker than that. My file header is reproduced below [1]. I am using h5dump -b LE -d /C00 -o outfile.raw infile.h5 to convert. For comparison, 'h5copy' is run thusly: h5copy -s C00 -d C00 -i infile.h5 -o ./testing.h5 -v -f ref While running, h5dump pegs a core at 98+% CPU usage. I've tried attaching gdb to the process while it's running, so that I can obtain some poor-man's profiling. One popular stacktrace is appended below [2]. I guess it's locking and unlocking a mutex constantly? Other traces I have seen multiple times: H5I_object_verify called from H5Tequal; __pthread_setcancelstate from H5TS_cancel_count_inc from H5Tequal; __pthread_mutex_lock from H5TS_mutex_lock from H5open; H5T_cmp from H5Tequal (rarely). If locking is indeed the problem, can I disable it at runtime somehow? These files are only being accessed by one process at a time, h5dump isn't even multithreaded anyway, and furthermore the access is purely read-only. I am using HDF5 1.8.4. Please enlighten me as to how I can get reasonable performance out of these files. Thanks, -tom [1] $ h5dump -p -H TS_2011_12_26/TS_C00_0_16.h5 HDF5 "TS_C00_0_16.h5" { GROUP "/" { DATASET "C00" { DATATYPE H5T_STD_U16LE DATASPACE SIMPLE { ( 301, 2550, 2550 ) / ( 301, 2550, 2550 ) } STORAGE_LAYOUT { CONTIGUOUS SIZE 3914505000 OFFSET 1400 } FILTERS { NONE } FILLVALUE { FILL_TIME H5D_FILL_TIME_IFSET VALUE 0 } ALLOCATION_TIME { H5D_ALLOC_TIME_LATE } } } } [2] (gdb) bt #0 __pthread_mutex_lock (mutex=0x7ff2d1e7fac8) at pthread_mutex_lock.c:47 #1 0x00007ff2d1bf67d6 in H5TS_mutex_unlock () from /usr/lib/libhdf5.so.6 #2 0x00007ff2d19105b8 in H5open () from /usr/lib/libhdf5.so.6 #3 0x0000000000420501 in ?? () #4 0x000000000041fbf6 in ?? () #5 0x0000000000416e7f in ?? () #6 0x000000000041c856 in ?? () #7 0x000000000041cea9 in ?? () #8 0x000000000040abaf in ?? () #9 0x000000000040a20d in ?? () #10 0x000000000040d3c6 in ?? () #11 0x000000000040f387 in ?? () #12 0x00007ff2d156530d in __libc_start_main (main=0x40eae4, argc=8, ubp_av=0x7fffa529e648, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffa529e638) at libc-start.c:226 #13 0x0000000000405349 in ?? () _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
