Yeah we did try and scrutinize our opens/closes, and did actually end up
finding a missing close which was making a small leak,
but unfortunately even the code above runs slow and I think the only open
handle there is fileSpaceId (which I've since made sure to close in the demo
program for good measure).

On Thu, May 27, 2010 at 1:17 PM, <[email protected]> wrote:

>
> FYI...  I had thought that I was experiencing a very similar problem under
> Linux.
> As my loop progressed, my performance writing via hyperslab grew worse and
> worse.
> After further troubleshooting and profiling I discovered that I had missed
> some
> H5Dclose() and H5Aclose() calls. Fixing that made a HUGE difference in my
> test case (from 52 secs to approximately 2.5 secs for about 8Mb of data).
>
> Kirk
>
>
> > Hi Ken,
> >
> > On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:
> >
> >> Hi, sorry to not get back sooner, I've found a couple of other
> >> interesting things.  The speed issue doesn't seem to exist running in
> >> linux, the same code runs in a blink.  In windows, the really, really
> >> slow runs (several minutes) only seem to happen when running from within
> >> visual studios. When run from command line it's slow, e.g. 15 seconds
> >> for 4000 vectors, but not minutes slow, and the time doesn't seem to
> >> grow as I saw before with visual studios.
> >
> >       Hmm, OK, I'll note that in our bug report.  Sounds pretty Windows
> > specific though...
> >
> >       Quincey
> >
> >
> >> #ifdef __cplusplus
> >> extern "C" {
> >> #endif
> >> #include <hdf5.h>
> >> #ifdef __cplusplus
> >> }
> >> #endif
> >> #include <vector>
> >> #include <iostream>
> >> #include <stdlib.h>
> >> #include <math.h>
> >>
> >> using namespace std;
> >>
> >> int main() {
> >>   unsigned long long totalNumVecs = 500000;
> >>   unsigned long long vecLength = 128;
> >>   hid_t baseType = H5T_NATIVE_FLOAT;
> >>
> >>   unsigned long long roughNumVecsToGet = 4000;
> >>   unsigned long long skipRate = (unsigned long
> >> long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
> >>   vector<unsigned long long> vecInds;
> >>   for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
> >>     vecInds.push_back(rowInd);
> >>   }
> >>
> >>   int rank = 2;
> >>   hsize_t dims[2];
> >>   dims[0] = totalNumVecs;
> >>   dims[1] = vecLength;
> >>   hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);
> >>
> >>   hsize_t fileBlockCount[2];
> >>   hsize_t fileOffset[2];
> >>
> >>   hsize_t selectionDims[2];
> >>   selectionDims[0] = 1;
> >>   fileBlockCount[0] = 1;
> >>   fileOffset[0] = vecInds[0];
> >>   for(int ir = 1; ir < rank; ++ir) {
> >>     selectionDims[ir] = dims[ir];
> >>     fileBlockCount[ir] = 1;
> >>     fileOffset[ir] = 0;
> >>   }
> >>
> >>   cout << "begin hyperslab building" << endl;
> >>   H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
> >> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> >>   unsigned long long numVecsToRead = vecInds.size();
> >>   for (hsize_t id=1; id < numVecsToRead; ++id) {
> >>     if ( (id % 50) == 0) {
> >>       cout << id << "/" << numVecsToRead << endl;
> >>     }
> >>     fileOffset[0] = vecInds[id];
> >>     H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
> >> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> >>   }
> >>   cout << "end hyperslab building" << endl;
> >>
> >>
> >>   return 0;
> >> }
> >>
> >>
> >> Thanks,
> >> Ken
> >>
> >>
> >> On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <[email protected]>
> >> wrote:
> >> Hi Ken,
> >>
> >> On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:
> >>
> >> > Hi, I'm running into slow performance when selecting several  (>1000)
> >> non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X
> >> 100.  The bottleneck is the for loop where each row vector index is
> >> OR'ed into the hyperslab, i.e.:
> >> >
> >> >   LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out
> >> with time stamp
> >> >   //select file buffer hyperslabs
> >> >   H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
> >> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> >> >   for (hsize_t id = 1; id < numVecsToRead; ++id) {
> >> >     LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
> >> >     fileOffset[0] = fileLocs1Dim[id];
> >> >     H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
> >> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
> >> >   }
> >> >   LOG4CXX_INFO(logger,"TIME end hyperslab building");
> >> >
> >> > One interesting thing is the time between each loop increases between
> >> each iteration, e.g. no time at all between 1-2-3-4-5, but seconds
> >> between 1000-1001-1002. So, the time to select the hyperslab is worse
> >> than linear, and can become amazingly time consuming, e.g. >10 minutes
> >> (!) for a few thousand. The read itself is very quick.
> >>
> >>        Drat!  Sounds like we've got an O(n^2) algorithm (or worse)
> >> somewhere in the code that combines two selections.  Can you send
> >> us a standalone program that demonstrates the problem, so we can
> >> file an issue for this, and get it fixed?
> >>
> >> > My current workaround is to check if the number of vectors to select
> >> is greater than a heuristically determined number where it seems the
> >> time to read the entire file (half a million row vectors) and copy the
> >> requested vectors is less than the time to run the hyperslab
> >> selection.  Generally the number works out to ~500 vecs/0.5 seconds.
> >> >
> >> > While poking around the code, I found a similar function,
> >> H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
> >> defined.  Using this significantly reduced the time of selection, in
> >> particular the time for each OR-ing seemed constant, so 2000 vectors
> >> took twice as long as 1000, not many times as with
> >> H5Sselect_hyperslab().  However, it's still 10s of seconds for few
> >> thousand vector selection, and so it's still much quicker to read all
> >> and copy (~1/2 second).
> >> > Reading all and copying is not an ideal solution, as it requires
> >> malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab()
> >> the crossover number goes up, i.e. more than 500, and it's less likely
> >> to be needed.  I'm a bit nervous however about using this undocumented
> >> code.
> >> >
> >> > So...am I doing something wrong?  Is there a speedy way to select a
> >> hyperslab consisting of 100s or 1000s of non-consecutive vectors?  Is
> >> NEW_HYPERSLAB_API safe?
> >>
> >>        Currently, the NEW_HYPERSLAB_API is not tested or supported, so I
> >> wouldn't use it.
> >>
> >>        Quincey
> >>
> >>
> >> _______________________________________________
> >> Hdf-forum is for HDF software users discussion.
> >> [email protected]
> >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >>
> >> _______________________________________________
> >> Hdf-forum is for HDF software users discussion.
> >> [email protected]
> >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to