> > Hmm. Is a distributed parallel solution an option for you? Or, are you by > chance attempting on a GPU? ~1TB is 10^6 unknowns, or thereabouts. That is > a fairly large matrix to attempt in a single node setting (hence the > out-of-core approach you are taking I guess).
Yeah, I am using MPI to distribute (currently to 20 machines with 8 cores a piece). And no, I am not using a GPU for two reasons. First, I don't know much about them. And second, I think the order of magnitudes say that it is a bad idea. Remember that 1Tb is sparse, so we are looking at matrices of sizes about 10^6x10^6 or bigger. A typical row might have 1500 elements and the dimensionality is about 4*10^6. So sparse multiplication pays off a lot. Do you have any control over the data in the file? If so, why not have the > data producer write an auxiliarly dataset to the file; a bit-field, that is > one for a non-empty chunk in the matrix and zero for empty chunks. You > could read that dataset first (its small). Then, use it to bootstrap the > read and compute process. This is a good idea, although it means more coding and debuggin. I try to see if there is a "built-in" way first, then go for this if not. Thanks. Aidan Plenert Macdonald Website <http://acsweb.ucsd.edu/~amacdona/> On Wed, Aug 12, 2015 at 12:00 PM, Miller, Mark C. <mille...@llnl.gov> wrote: > Hmm. Is a distributed parallel solution an option for you? Or, are you by > chance attempting on a GPU? ~1TB is 10^6 unknowns, or thereabouts. That is > a fairly large matrix to attempt in a single node setting (hence the > out-of-core approach you are taking I guess). > > Also, I wouldn't necessarily read chunks one by one. Read say 1,000 chunks > at a time, tearing down the empty ones (e.g. all-fill) as you go. > > Do you have any control over the data in the file? If so, why not have the > data producer write an auxiliarly dataset to the file; a bit-field, that is > one for a non-empty chunk in the matrix and zero for empty chunks. You > could read that dataset first (its small). Then, use it to bootstrap the > read and compute process. > > Just spit-ballin' here ;) > > Mark > > > From: Hdf-forum <hdf-forum-boun...@lists.hdfgroup.org> on behalf of Aidan > Macdonald <aidan.plenert.macdon...@gmail.com> > Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org> > Date: Wednesday, August 12, 2015 11:50 AM > To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org> > Subject: Re: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated > Chunks > > To respond to David Schneider's comment, I thought about this, but it > requires, as far as I can imagine, that I create a bunch of matrices in the > HDF5 file and use them as part of the sparse index labels. > > Also, loading into memory is not possible as the matrix is way too big ~10 > Gb and eventually getting to ~1 Tb. I need to do everything out of core. > > The down side is that you wind up instantiating fully-populated blocks in >> memory when the block didn't exist in the file. But, if you do this >> piece-wise as opposed to a single H5Dread call for the entire matrix *and* >> if you design your H5Dread calls to align with one or more full blocks, you >> can build up the equivalent sparse representation in memory *without* >> paying an initial price of instantiating a full, dense matrix and then >> taring it down to build the final sparse representation. > > > One of the major problems is the dimensionality. For a *small* dataset, > we have 4,000,000 columns eventually scaling to ~10^9 columns, note that > this is sparse data, so most columns are zeros per row. I am blocking in > chunks of 1 row and several columns to simply read in every chunk one by > one would still take way too long (read time, not a memory issue). > > I like the ideas, but I am afraid that the compute time will be way too > much. Currently, I am at 7sec for a read and multiply per row and with > millions to billions of rows, this is too long. Perhaps I didn't understand > you correctly. > > Thanks for helping too. > > Aidan Plenert Macdonald > Website <http://acsweb.ucsd.edu/~amacdona/> > > On Wed, Aug 12, 2015 at 11:35 AM, Aidan Macdonald < > aidan.plenert.macdon...@gmail.com> wrote: > >> Is this what you mean by a 'sparse format'? >> >> >> Yes, exactly. >> >> However, I am not sure why you need to know how HDF5 has handled the >>> chunks *in*the*file, unless you are attempting to write an out-of-core >>> matrix multiply. >> >> >> Yes, I am trying to write an out-of-core matrix multiply. >> >> I think you can easily determine which blocks are 'empty' by examining >>> a block you've read into memory for all fill value or not. Any block which >>> consists entirely of fill-value is, of course, an empty block. And, then >>> you can use that information to help bootstrap your sparse matrix multiply. >>> So, you could maybe read the matrix several blocks at a time, rather than >>> all at once, examining returned blocks for all-fill-value or not and then >>> building up your sparse in memory representation from that. If you read the >>> matrix in one H5Dread call, however, then you'd wind up with a >>> fully instantiated matrix with many fill values in memory *before* you >>> could be being to reduce that storage to a sparse format. >> >> >> I think, but can't prove, that if I did the check, I would create more >> CPU cycles that I would save. Because I need to read, check, and then dump >> or multiply out. I was hoping for something that could give me a list of >> the allocated chunks, then I could do a "dictionary of keys" >> <https://en.wikipedia.org/wiki/Sparse_matrix#Dictionary_of_keys_.28DOK.29> >> block matrix >> multiplication. >> >> According to Table 15 here >> <https://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html>, if the space is >> not allocated, then an error is thrown. So perhaps this error will be >> faster than actually reading the data from disk. But to do that, I need the >> fill_value undefined. I was hoping for a better way to see if the chunk is >> actually allocated. >> >> The best way in my mind is to get some sort of list of all the chunks >> that are allocated. Or a iterator that goes through them, then I can do the >> block matrix multiplication. >> >> Aidan Plenert Macdonald >> Website <http://acsweb.ucsd.edu/~amacdona/> >> >> On Wed, Aug 12, 2015 at 10:16 AM, Miller, Mark C. <mille...@llnl.gov> >> wrote: >> >>> Have a look at this reference . . . >>> >>> http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html >>> >>> as well as documentation on H5Pset_fill_value and H5Pset_fill_time. >>> >>> I have a vague recollection that if you create a large, chunked dataset >>> but then only write to certain parts of it, HDF5 is smart enough to store >>> only those chunks in the file that actually have non-fill values within >>> them. The above ref seems to be consistent with this (except in parallel >>> I/O settings). >>> >>> Is this what you mean by a 'sparse format'? >>> >>> However, I am not sure why you need to know how HDF5 has handled the >>> chunks *in*the*file, unless you are attempting to write an out-of-core >>> matrix multiply. >>> >>> I think you can easily determine which blocks are 'empty' by examining a >>> block you've read into memory for all fill value or not. Any block which >>> consists entirely of fill-value is, of course, an empty block. And, then >>> you can use that information to help bootstrap your sparse matrix multiply. >>> So, you could maybe read the matrix several blocks at a time, rather than >>> all at once, examining returned blocks for all-fill-value or not and then >>> building up your sparse in memory representation from that. If you read the >>> matrix in one H5Dread call, however, then you'd wind up with a fully >>> instatiated matrix with many fill values in memory *before* you could be >>> being to reduce that storage to a sparse format. >>> >>> I wonder if it might be possible to write your own custom 'filter' that >>> you applied during H5Dread that would do all this for you as chunks are >>> read from the file? It might be. >>> >>> Mark >>> >>> >>> >>> From: Hdf-forum <hdf-forum-boun...@lists.hdfgroup.org> on behalf of >>> Aidan Macdonald <aidan.plenert.macdon...@gmail.com> >>> Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org> >>> Date: Wednesday, August 12, 2015 9:05 AM >>> To: "hdf-forum@lists.hdfgroup.org" <hdf-forum@lists.hdfgroup.org> >>> Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated >>> Chunks >>> >>> Hi, >>> >>> I am using Python h5py to use HDF5, but I am planning on pushing into >>> C/C++. >>> >>> I am using HDF5 to store sparse matrices which I need to do matrix >>> products on. I am using chunked storage which 'appears' to be storing the >>> data in a block sparse format. PLEASE CONFIRM that this is true. I couldn't >>> find documentation stating this to be true, but by looking at file sizes >>> during data loading, my block sparse assumption seemed to be true. >>> >>> I would like to matrix multiply and use the sparsity of the data to make >>> it go faster. I can handle the algorithmic aspect, but I can't figure out >>> how to see which chunks are allocated so I can iterate over these. >>> >>> If there is a better way to go at this (existing code!), please let me >>> know. I am new to HDF5, and thoroughly impressed. >>> >>> Thank you, >>> >>> Aidan Plenert Macdonald >>> Website <http://acsweb.ucsd.edu/~amacdona/> >>> >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> Hdf-forum@lists.hdfgroup.org >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> Twitter: https://twitter.com/hdf5 >>> >> >> > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > Hdf-forum@lists.hdfgroup.org > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5