>
> Hmm. Is a distributed parallel solution an option for you? Or, are you by
> chance attempting on a GPU? ~1TB is 10^6 unknowns, or thereabouts. That is
> a fairly large matrix to attempt in a single node setting (hence the
> out-of-core approach you are taking I guess).


Yeah, I am using MPI to distribute (currently to 20 machines with 8 cores a
piece). And no, I am not using a GPU for two reasons. First, I don't know
much about them. And second, I think the order of magnitudes say that it is
a bad idea. Remember that 1Tb is sparse, so we are looking at matrices of
sizes about 10^6x10^6 or bigger. A typical row might have 1500 elements and
the dimensionality is about 4*10^6. So sparse multiplication pays off a lot.

Do you have any control over the data in the file? If so, why not have the
> data producer write an auxiliarly dataset to the file; a bit-field, that is
> one for a non-empty chunk in the matrix and zero for empty chunks. You
> could read that dataset first (its small). Then, use it to bootstrap the
> read and compute process.


This is a good idea, although it means more coding and debuggin. I try to
see if there is a "built-in" way first, then go for this if not. Thanks.

Aidan Plenert Macdonald
Website <http://acsweb.ucsd.edu/~amacdona/>

On Wed, Aug 12, 2015 at 12:00 PM, Miller, Mark C. <mille...@llnl.gov> wrote:

> Hmm. Is a distributed parallel solution an option for you? Or, are you by
> chance attempting on a GPU? ~1TB is 10^6 unknowns, or thereabouts. That is
> a fairly large matrix to attempt in a single node setting (hence the
> out-of-core approach you are taking I guess).
>
> Also, I wouldn't necessarily read chunks one by one. Read say 1,000 chunks
> at a time, tearing down the empty ones (e.g. all-fill) as you go.
>
> Do you have any control over the data in the file? If so, why not have the
> data producer write an auxiliarly dataset to the file; a bit-field, that is
> one for a non-empty chunk in the matrix and zero for empty chunks. You
> could read that dataset first (its small). Then, use it to bootstrap the
> read and compute process.
>
> Just spit-ballin' here ;)
>
> Mark
>
>
> From: Hdf-forum <hdf-forum-boun...@lists.hdfgroup.org> on behalf of Aidan
> Macdonald <aidan.plenert.macdon...@gmail.com>
> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Date: Wednesday, August 12, 2015 11:50 AM
> To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
> Subject: Re: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated
> Chunks
>
> To respond to David Schneider's comment, I thought about this, but it
> requires, as far as I can imagine, that I create a bunch of matrices in the
> HDF5 file and use them as part of the sparse index labels.
>
> Also, loading into memory is not possible as the matrix is way too big ~10
> Gb and eventually getting to ~1 Tb. I need to do everything out of core.
>
> The down side is that you wind up instantiating fully-populated blocks in
>> memory when the block didn't exist in the file. But, if you do this
>> piece-wise as opposed to a single H5Dread call for the entire matrix *and*
>> if you design your H5Dread calls to align with one or more full blocks, you
>> can build up the equivalent sparse representation in memory *without*
>> paying an initial price of instantiating a full, dense matrix and then
>> taring it down to build the final sparse representation.
>
>
> One of the major problems is the dimensionality. For a *small* dataset,
>  we have 4,000,000 columns eventually scaling to ~10^9 columns, note that
> this is sparse data, so most columns are zeros per row. I am blocking in
> chunks of 1 row and several columns to simply read in every chunk one by
> one would still take way too long (read time, not a memory issue).
>
> I like the ideas, but I am afraid that the compute time will be way too
> much. Currently, I am at 7sec for a read and multiply per row and with
> millions to billions of rows, this is too long. Perhaps I didn't understand
> you correctly.
>
> Thanks for helping too.
>
> Aidan Plenert Macdonald
> Website <http://acsweb.ucsd.edu/~amacdona/>
>
> On Wed, Aug 12, 2015 at 11:35 AM, Aidan Macdonald <
> aidan.plenert.macdon...@gmail.com> wrote:
>
>> Is this what you mean by a 'sparse format'?
>>
>>
>> Yes, exactly.
>>
>>  However, I am not sure why you need to know how HDF5 has handled the
>>> chunks *in*the*file, unless you are attempting to write an out-of-core
>>> matrix multiply.
>>
>>
>> Yes, I am trying to write an out-of-core matrix multiply.
>>
>>  I think you can easily determine which blocks are 'empty' by examining
>>> a block you've read into memory for all fill value or not. Any block which
>>> consists entirely of fill-value is, of course, an empty block. And, then
>>> you can use that information to help bootstrap your sparse matrix multiply.
>>> So, you could maybe read the matrix several blocks at a time, rather than
>>> all at once, examining returned blocks for all-fill-value or not and then
>>> building up your sparse in memory representation from that. If you read the
>>> matrix in one H5Dread call, however, then you'd wind up with a
>>> fully instantiated matrix with many fill values in memory *before* you
>>> could be being to reduce that storage to a sparse format.
>>
>>
>> I think, but can't prove, that if I did the check, I would create more
>> CPU cycles that I would save. Because I need to read, check, and then dump
>> or multiply out. I was hoping for something that could give me a list of
>> the allocated chunks, then I could do a "dictionary of keys"
>> <https://en.wikipedia.org/wiki/Sparse_matrix#Dictionary_of_keys_.28DOK.29> 
>> block matrix
>> multiplication.
>>
>> According to Table 15 here
>> <https://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html>, if the space is
>> not allocated, then an error is thrown. So perhaps this error will be
>> faster than actually reading the data from disk. But to do that, I need the
>> fill_value undefined. I was hoping for a better way to see if the chunk is
>> actually allocated.
>>
>> The best way in my mind is to get some sort of list of all the chunks
>> that are allocated. Or a iterator that goes through them, then I can do the
>> block matrix multiplication.
>>
>> Aidan Plenert Macdonald
>> Website <http://acsweb.ucsd.edu/~amacdona/>
>>
>> On Wed, Aug 12, 2015 at 10:16 AM, Miller, Mark C. <mille...@llnl.gov>
>> wrote:
>>
>>> Have a look at this reference . . .
>>>
>>> http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html
>>>
>>> as well as documentation on H5Pset_fill_value and H5Pset_fill_time.
>>>
>>> I have a vague recollection that if you create a large, chunked dataset
>>> but then only write to certain parts of it, HDF5 is smart enough to store
>>> only those chunks in the file that actually have non-fill values within
>>> them. The above ref seems to be consistent with this (except in parallel
>>> I/O settings).
>>>
>>> Is this what you mean by a 'sparse format'?
>>>
>>> However, I am not sure why you need to know how HDF5 has handled the
>>> chunks *in*the*file, unless you are attempting to write an out-of-core
>>> matrix multiply.
>>>
>>> I think you can easily determine which blocks are 'empty' by examining a
>>> block you've read into memory for all fill value or not. Any block which
>>> consists entirely of fill-value is, of course, an empty block. And, then
>>> you can use that information to help bootstrap your sparse matrix multiply.
>>> So, you could maybe read the matrix several blocks at a time, rather than
>>> all at once, examining returned blocks for all-fill-value or not and then
>>> building up your sparse in memory representation from that. If you read the
>>> matrix in one H5Dread call, however, then you'd wind up with a fully
>>> instatiated matrix with many fill values in memory *before* you could be
>>> being to reduce that storage to a sparse format.
>>>
>>> I wonder if it might be possible to write your own custom 'filter' that
>>> you applied during H5Dread that would do all this for you as chunks are
>>> read from the file? It might be.
>>>
>>> Mark
>>>
>>>
>>>
>>> From: Hdf-forum <hdf-forum-boun...@lists.hdfgroup.org> on behalf of
>>> Aidan Macdonald <aidan.plenert.macdon...@gmail.com>
>>> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
>>> Date: Wednesday, August 12, 2015 9:05 AM
>>> To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org>
>>> Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated
>>> Chunks
>>>
>>> Hi,
>>>
>>> I am using Python h5py to use HDF5, but I am planning on pushing into
>>> C/C++.
>>>
>>> I am using HDF5 to store sparse matrices which I need to do matrix
>>> products on. I am using chunked storage which 'appears' to be storing the
>>> data in a block sparse format. PLEASE CONFIRM that this is true. I couldn't
>>> find documentation stating this to be true, but by looking at file sizes
>>> during data loading, my block sparse assumption seemed to be true.
>>>
>>> I would like to matrix multiply and use the sparsity of the data to make
>>> it go faster. I can handle the algorithmic aspect, but I can't figure out
>>> how to see which chunks are allocated so I can iterate over these.
>>>
>>> If there is a better way to go at this (existing code!), please let me
>>> know. I am new to HDF5, and thoroughly impressed.
>>>
>>> Thank you,
>>>
>>> Aidan Plenert Macdonald
>>> Website <http://acsweb.ucsd.edu/~amacdona/>
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@lists.hdfgroup.org
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5
>>>
>>
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to