I don't have experience optimizing sparse matrix products with hdf5, but
I imagine you would want to read back the sparse representation of each
matrix so that the multiplies would be efficient, or is it like Mark
describes, you want a compressed representation on disk and to work with
full matrices in memory? If the former, you would create a sparse
representation in memory yourself and write that to the dataset, I would
think of the chunk as more of a low level data storage thing - chunks
are of a fixed size (although they can be compressed with something like
gzip), a sparse matrix representation varies in size depending on how
dense the original matrix was, I would just read the whole sparse matrix
back - only worry about the chunk layout to optimize I/O performance.
best,
David
On 08/12/15 10:16, Miller, Mark C. wrote:
Have a look at this reference . . .
http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html
as well as documentation on H5Pset_fill_value and H5Pset_fill_time.
I have a vague recollection that if you create a large, chunked
dataset but then only write to certain parts of it, HDF5 is smart
enough to store only those chunks in the file that actually have
non-fill values within them. The above ref seems to be consistent with
this (except in parallel I/O settings).
Is this what you mean by a 'sparse format'?
However, I am not sure why you need to know how HDF5 has handled the
chunks *in*the*file, unless you are attempting to write an out-of-core
matrix multiply.
I think you can easily determine which blocks are 'empty' by examining
a block you've read into memory for all fill value or not. Any block
which consists entirely of fill-value is, of course, an empty block.
And, then you can use that information to help bootstrap your sparse
matrix multiply. So, you could maybe read the matrix several blocks at
a time, rather than all at once, examining returned blocks for
all-fill-value or not and then building up your sparse in memory
representation from that. If you read the matrix in one H5Dread call,
however, then you'd wind up with a fully instatiated matrix with many
fill values in memory *before* you could be being to reduce that
storage to a sparse format.
I wonder if it might be possible to write your own custom 'filter'
that you applied during H5Dread that would do all this for you as
chunks are read from the file? It might be.
Mark
From: Hdf-forum <hdf-forum-boun...@lists.hdfgroup.org
<mailto:hdf-forum-boun...@lists.hdfgroup.org>> on behalf of Aidan
Macdonald <aidan.plenert.macdon...@gmail.com
<mailto:aidan.plenert.macdon...@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org
<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 9:05 AM
To: "hdf-forum@lists.hdfgroup.org
<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org
<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated
Chunks
Hi,
I am using Python h5py to use HDF5, but I am planning on pushing
into C/C++.
I am using HDF5 to store sparse matrices which I need to do matrix
products on. I am using chunked storage which 'appears' to be
storing the data in a block sparse format. PLEASE CONFIRM that
this is true. I couldn't find documentation stating this to be
true, but by looking at file sizes during data loading, my block
sparse assumption seemed to be true.
I would like to matrix multiply and use the sparsity of the data
to make it go faster. I can handle the algorithmic aspect, but I
can't figure out how to see which chunks are allocated so I can
iterate over these.
If there is a better way to go at this (existing code!), please
let me know. I am new to HDF5, and thoroughly impressed.
Thank you,
Aidan Plenert Macdonald
Website <http://acsweb.ucsd.edu/%7Eamacdona/>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter:https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5