Balint,
I am not sure whether pre-allocation will help the performance but there is
a good chance it may since the default for chunked data sets is to allocate
space incrementally (chunk by chunk) as data is written to the data set,
especially if the chunks are small and there are a lot of them. If matlab
has access to the low level HDF5 APIs (which I beleive it does) you can use
the H5Pset_alloc_time and pass alloc_time as H5D_ALLOC_TIME_EARLY to set a
dataset creation property list. There should be no need to mess with the
fill value or do any filling as far as I can tell.

You will need to create a property list first then set this property then
pass it in to H5Dcreate. Also, I think matlab splits the HDF5 API into
classes so the function might look like H5P.set_alloc_time or something
like that. It might also be worth while to check that matlab is a recent
version so that it is compiled/linked against a recent HDF5 version/build.

Documentation for H5Pset_alloc_time may be found here:
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetAllocTime

Good luck,
Izaak Beekman
===================================
(301)244-9367
Princeton University Doctoral Candidate
Mechanical and Aerospace Engineering
[email protected]

UMD-CP Visiting Graduate Student
Aerospace Engineering
[email protected]
[email protected]


On Tue, Dec 13, 2011 at 12:00 PM, <[email protected]> wrote:

> Send Hdf-forum mailing list submissions to
>        [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> or, via email, send a message with subject or body 'help' to
>        [email protected]
>
> You can reach the person managing the list at
>        [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Hdf-forum digest..."
>
> Today's Topics:
>
>   1. Datasets keep old names after parent has been renamed?
>      (Darren Dale)
>   2. speeding up write of chunked HDF (Balint Takacs)
>
>
> ---------- Forwarded message ----------
> From: Darren Dale <[email protected]>
> To: HDF Users Discussion List <[email protected]>
> Cc:
> Date: Mon, 12 Dec 2011 12:29:18 -0500
> Subject: [Hdf-forum] Datasets keep old names after parent has been renamed?
> (Apologies if this gets posted twice)
>
> Someone reported a bug at the h5py issue tracker:
>
> ---
> import h5py
>
> # test setup
> fid = h5py.File('test.hdf5', 'w')
>
> g = fid.create_group('old_loc')
> g2 = g.create_group('group')
> d = g.create_dataset('dataset', data=0)
>
> print "before move:"
> print g2.name
> print d.name
>
> # now rename toplevel group
> g.parent.id.move('old_loc', 'new_loc')
>
> print "after move:"
> # old parent remains in dataset name, group is ok
> print g2.name
> print d.name
>
> # parent is accessed by name 'g' which does not exist any more
> d.parent
>
> fid.close()
> ---
>
> That script produces the following output:
>
> ---
> before move:
> /old_loc/group
> /old_loc/dataset
> after move:
> /new_loc/group
> /old_loc/dataset
> Traceback (most recent call last):
>  File "move_error.py", line 24, in <module>
>   d.parent
>  File
> "/Users/darren/Library/Python/2.7/lib/python/site-packages/h5py/_hl/base.py",
> line 144, in parent
>   return self.file[posixpath.dirname(self.name)]
>  File
> "/Users/darren/Library/Python/2.7/lib/python/site-packages/h5py/_hl/group.py",
> line 128, in __getitem__
>   oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
>  File "h5o.pyx", line 176, in h5py.h5o.open (h5py/h5o.c:2814)
> KeyError: "unable to open object (Symbol table: Can't open object)"
> ---
>
> g.name and d.name simply return the result of h5i.get_name.
>
> d.parent just splits d.name at the last "/" and returns the the first
> part of the split.
>
> g.parent.id.move calls H5Gmove2. I've read the warnings about
> corrupting data using H5Gmove at
> http://www.hdfgroup.org/HDF5/doc1.6/Groups.html#H5GUnlinkToCorrupt ,
> but the situation described there does not appear to be relevant to
> the problem we are seeing. Is h5py not performing the move properly,
> or could this be a bug in HDF5?
>
> Thanks,
> Darren
>
>
>
>
> ---------- Forwarded message ----------
> From: Balint Takacs <[email protected]>
> To: [email protected]
> Cc:
> Date: Tue, 13 Dec 2011 12:37:08 +0000
> Subject: [Hdf-forum] speeding up write of chunked HDF
> Hi all,
>
> I need to fill a huge 3D array, chunked in its second dimension. My data
> are coming as slices with a fixed index in the third dimension, so the
> layout needs to be re-ordered. The chunks are uncompressed. When the data
> is read, the access pattern sweeps through it in the second dimension, so
> the chunking layout makes sense. The data is stored on an SSD, so random
> access should be relatively fast. I cannot manipulate data index order.
>
> In theory, when filling up the array, the data could be continuously
> written if it were to be stored in a raw file. However, with HDF this
> becomes painfully slow. The only way I found to speed this up somewhat is
> to read as much slices I can into memory, and then write together in
> batches, but I still experience <2MB/sec write transfers on average.
>
> The file is gradually growing as the slices are added. If this expansion
> requires re-ordering the entire data, this could explain the slow write
> speed. I was wondering whether pre-allocating the entire file somehow could
> help with this, and what is the best way to do it. I could not find any
> related API function. I know the entire data size before the data
> collection starts.
>
> The only idea I have so far is to fill the array with some dummy value
> (not the fill one) by sweeping through the chunking dimension before adding
> the slices. This would probably grow the file to its final size rapidly,
> but I am not sure that this helps at all, and is definitely ugly.
>
> I am using MATLAB 2007a with the 1.6.5 HDF library it is coming with.
>
> Thank you for you comments in advance.
>
> Regards,
>
> Balint
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to