Re: [Hdf-forum] slow hyperslab selection, H5Sselect_hyperslab, H5Scombine_hyperslab, and NEW_HYPERSLAB_API

Quincey Koziol Thu, 27 May 2010 11:59:23 -0700

Hi Ken,

On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:


> Hi, sorry to not get back sooner, I've found a couple of other interesting 
> things.  The speed issue doesn't seem to exist running in linux, the same 
> code runs in a blink.  In windows, the really, really slow runs (several 
> minutes) only seem to happen when running from within visual studios. When 
> run from command line it's slow, e.g. 15 seconds for 4000 vectors, but not 
> minutes slow, and the time doesn't seem to grow as I saw before with visual 
> studios. 

        Hmm, OK, I'll note that in our bug report.  Sounds pretty Windows 
specific though...

        Quincey


> #ifdef __cplusplus
> extern "C" {
> #endif
> #include <hdf5.h>
> #ifdef __cplusplus
> }
> #endif
> #include <vector>
> #include <iostream>
> #include <stdlib.h>
> #include <math.h>
> 
> using namespace std;
> 
> int main() {
>   unsigned long long totalNumVecs = 500000;
>   unsigned long long vecLength = 128;
>   hid_t baseType = H5T_NATIVE_FLOAT;
>   
>   unsigned long long roughNumVecsToGet = 4000;
>   unsigned long long skipRate = (unsigned long long)ceilf((float)totalNumVecs 
> / (float)roughNumVecsToGet);
>   vector<unsigned long long> vecInds;
>   for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
>     vecInds.push_back(rowInd);
>   }
> 
>   int rank = 2;
>   hsize_t dims[2];
>   dims[0] = totalNumVecs;
>   dims[1] = vecLength;
>   hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);
> 
>   hsize_t fileBlockCount[2];
>   hsize_t fileOffset[2];
> 
>   hsize_t selectionDims[2];
>   selectionDims[0] = 1;
>   fileBlockCount[0] = 1;
>   fileOffset[0] = vecInds[0];
>   for(int ir = 1; ir < rank; ++ir) {
>     selectionDims[ir] = dims[ir];
>     fileBlockCount[ir] = 1;
>     fileOffset[ir] = 0;
>   }
> 
>   cout << "begin hyperslab building" << endl;
>   H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*) 
> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>   unsigned long long numVecsToRead = vecInds.size();
>   for (hsize_t id=1; id < numVecsToRead; ++id) {
>     if ( (id % 50) == 0) {
>       cout << id << "/" << numVecsToRead << endl;
>     }
>     fileOffset[0] = vecInds[id];
>     H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*) 
> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>   }
>   cout << "end hyperslab building" << endl;
> 
> 
>   return 0;
> }
> 
> 
> Thanks,
> Ken
> 
> 
> On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <[email protected]> wrote:
> Hi Ken,
> 
> On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:
> 
> > Hi, I'm running into slow performance when selecting several  (>1000) 
> > non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X 100. 
> >  The bottleneck is the for loop where each row vector index is OR'ed into 
> > the hyperslab, i.e.:
> >
> >   LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out with 
> > time stamp
> >   //select file buffer hyperslabs
> >   H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*) 
> > fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> >   for (hsize_t id = 1; id < numVecsToRead; ++id) {
> >     LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
> >     fileOffset[0] = fileLocs1Dim[id];
> >     H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*) 
> > fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> >   }
> >   LOG4CXX_INFO(logger,"TIME end hyperslab building");
> >
> > One interesting thing is the time between each loop increases between each 
> > iteration, e.g. no time at all between 1-2-3-4-5, but seconds between 
> > 1000-1001-1002. So, the time to select the hyperslab is worse than linear, 
> > and can become amazingly time consuming, e.g. >10 minutes (!) for a few 
> > thousand. The read itself is very quick.
> 
>        Drat!  Sounds like we've got an O(n^2) algorithm (or worse) somewhere 
> in the code that combines two selections.  Can you send us a standalone 
> program that demonstrates the problem, so we can file an issue for this, and 
> get it fixed?
> 
> > My current workaround is to check if the number of vectors to select is 
> > greater than a heuristically determined number where it seems the time to 
> > read the entire file (half a million row vectors) and copy the requested 
> > vectors is less than the time to run the hyperslab selection.  Generally 
> > the number works out to ~500 vecs/0.5 seconds.
> >
> > While poking around the code, I found a similar function, 
> > H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is 
> > defined.  Using this significantly reduced the time of selection, in 
> > particular the time for each OR-ing seemed constant, so 2000 vectors took 
> > twice as long as 1000, not many times as with H5Sselect_hyperslab().  
> > However, it's still 10s of seconds for few thousand vector selection, and 
> > so it's still much quicker to read all and copy (~1/2 second).
> > Reading all and copying is not an ideal solution, as it requires 
> > malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab() the 
> > crossover number goes up, i.e. more than 500, and it's less likely to be 
> > needed.  I'm a bit nervous however about using this undocumented code.
> >
> > So...am I doing something wrong?  Is there a speedy way to select a 
> > hyperslab consisting of 100s or 1000s of non-consecutive vectors?  Is 
> > NEW_HYPERSLAB_API safe?
> 
>        Currently, the NEW_HYPERSLAB_API is not tested or supported, so I 
> wouldn't use it.
> 
>        Quincey
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] slow hyperslab selection, H5Sselect_hyperslab, H5Scombine_hyperslab, and NEW_HYPERSLAB_API

Reply via email to