Hi Tom,

On Apr 27, 2012, at 5:46 AM, tom fogal wrote:

> Hi Quincey,
> 
> Thanks for your reply.
> 
> This helped considerably.  I can dump one of my files in 16.5 minutes now, 
> down from the 4+ hours it took before.  However, this is still the slowest 
> part of my pipeline.  Another order of magnitude improvement would be 
> welcome, of course ;), but I'd be really happy if we could just halve my 
> current runtime for it.  Any other ideas?

        Dunno, can you run with gprof?

> Secondly, my previous HDF5 was simply installed as part of Ubuntu.  I imagine 
> pre-installed HDF5 versions are common for many users.  Could I request that 
> "no thread safety" be made a runtime option, which the command line tools 
> could set implicitly?  As shown, it provides a huge performance benefit, and 
> is significantly easier to use, then, because users won't need to compile 
> their own HDF5.

        Hmm, that could be done, yes.  I'll file an issue for it.

                Quincey


> Thanks,
> 
> -tom
> 
> On 04/25/2012 05:53 PM, Quincey Koziol wrote:
>> Hi Tom,
>>      Looks like you are working with a thread-safe build of HDF5, which is 
>> unnecessary for the command-line tools.  You could rebuild the HDF5 
>> distribution (I would suggest moving up to 1.8.8 or the 1.8.9 prerelease) 
>> without the thread-safe configure flag, and that should get rid of the mutex 
>> issues.
>> 
>>      Quincey
>> 
>> On Apr 25, 2012, at 9:28 AM, tom fogal wrote:
>> 
>>> I am getting really awful performance using 'h5dump' to dump a scalar field 
>>> as a binary file.  It takes literally hours, whereas an h5copy -f ref takes 
>>> just under 2 minutes, and a simple 'cp' is a little bit quicker than that.
>>> 
>>> My file header is reproduced below [1].  I am using
>>> 
>>>  h5dump -b LE -d /C00 -o outfile.raw infile.h5
>>> 
>>> to convert.  For comparison, 'h5copy' is run thusly:
>>> 
>>>  h5copy -s C00 -d C00 -i infile.h5 -o ./testing.h5 -v -f ref
>>> 
>>> While running, h5dump pegs a core at 98+% CPU usage.
>>> 
>>> I've tried attaching gdb to the process while it's running, so that I can 
>>> obtain some poor-man's profiling.  One popular stacktrace is appended below 
>>> [2].  I guess it's locking and unlocking a mutex constantly?  Other traces 
>>> I have seen multiple times: H5I_object_verify called from H5Tequal; 
>>> __pthread_setcancelstate from H5TS_cancel_count_inc from H5Tequal; 
>>> __pthread_mutex_lock from H5TS_mutex_lock from H5open; H5T_cmp from 
>>> H5Tequal (rarely).
>>> 
>>> If locking is indeed the problem, can I disable it at runtime somehow? 
>>> These files are only being accessed by one process at a time, h5dump isn't 
>>> even multithreaded anyway, and furthermore the access is purely read-only.
>>> 
>>> I am using HDF5 1.8.4.  Please enlighten me as to how I can get reasonable 
>>> performance out of these files.
>>> 
>>> Thanks,
>>> 
>>> -tom
>>> 
>>> [1]
>>> $ h5dump -p -H TS_2011_12_26/TS_C00_0_16.h5
>>> HDF5 "TS_C00_0_16.h5" {
>>> GROUP "/" {
>>>   DATASET "C00" {
>>>      DATATYPE  H5T_STD_U16LE
>>>      DATASPACE  SIMPLE { ( 301, 2550, 2550 ) / ( 301, 2550, 2550 ) }
>>>      STORAGE_LAYOUT {
>>>         CONTIGUOUS
>>>         SIZE 3914505000
>>>         OFFSET 1400
>>>      }
>>>      FILTERS {
>>>         NONE
>>>      }
>>>      FILLVALUE {
>>>         FILL_TIME H5D_FILL_TIME_IFSET
>>>         VALUE  0
>>>      }
>>>      ALLOCATION_TIME {
>>>         H5D_ALLOC_TIME_LATE
>>>      }
>>>   }
>>> }
>>> }
>>> 
>>> [2]
>>> (gdb) bt
>>> #0  __pthread_mutex_lock (mutex=0x7ff2d1e7fac8) at pthread_mutex_lock.c:47
>>> #1  0x00007ff2d1bf67d6 in H5TS_mutex_unlock () from /usr/lib/libhdf5.so.6
>>> #2  0x00007ff2d19105b8 in H5open () from /usr/lib/libhdf5.so.6
>>> #3  0x0000000000420501 in ?? ()
>>> #4  0x000000000041fbf6 in ?? ()
>>> #5  0x0000000000416e7f in ?? ()
>>> #6  0x000000000041c856 in ?? ()
>>> #7  0x000000000041cea9 in ?? ()
>>> #8  0x000000000040abaf in ?? ()
>>> #9  0x000000000040a20d in ?? ()
>>> #10 0x000000000040d3c6 in ?? ()
>>> #11 0x000000000040f387 in ?? ()
>>> #12 0x00007ff2d156530d in __libc_start_main (main=0x40eae4, argc=8,
>>>    ubp_av=0x7fffa529e648, init=<optimized out>, fini=<optimized out>,
>>>    rtld_fini=<optimized out>, stack_end=0x7fffa529e638) at libc-start.c:226
>>> #13 0x0000000000405349 in ?? ()
>>> 
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to