FYI...  I had thought that I was experiencing a very similar problem under
Linux.
As my loop progressed, my performance writing via hyperslab grew worse and
worse.
After further troubleshooting and profiling I discovered that I had missed
some
H5Dclose() and H5Aclose() calls. Fixing that made a HUGE difference in my
test case (from 52 secs to approximately 2.5 secs for about 8Mb of data).

Kirk


> Hi Ken,
>
> On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:
>
>> Hi, sorry to not get back sooner, I've found a couple of other
>> interesting things.  The speed issue doesn't seem to exist running in
>> linux, the same code runs in a blink.  In windows, the really, really
>> slow runs (several minutes) only seem to happen when running from within
>> visual studios. When run from command line it's slow, e.g. 15 seconds
>> for 4000 vectors, but not minutes slow, and the time doesn't seem to
>> grow as I saw before with visual studios.
>
>       Hmm, OK, I'll note that in our bug report.  Sounds pretty Windows
> specific though...
>
>       Quincey
>
>
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>> #include <hdf5.h>
>> #ifdef __cplusplus
>> }
>> #endif
>> #include <vector>
>> #include <iostream>
>> #include <stdlib.h>
>> #include <math.h>
>>
>> using namespace std;
>>
>> int main() {
>>   unsigned long long totalNumVecs = 500000;
>>   unsigned long long vecLength = 128;
>>   hid_t baseType = H5T_NATIVE_FLOAT;
>>
>>   unsigned long long roughNumVecsToGet = 4000;
>>   unsigned long long skipRate = (unsigned long
>> long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
>>   vector<unsigned long long> vecInds;
>>   for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
>>     vecInds.push_back(rowInd);
>>   }
>>
>>   int rank = 2;
>>   hsize_t dims[2];
>>   dims[0] = totalNumVecs;
>>   dims[1] = vecLength;
>>   hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);
>>
>>   hsize_t fileBlockCount[2];
>>   hsize_t fileOffset[2];
>>
>>   hsize_t selectionDims[2];
>>   selectionDims[0] = 1;
>>   fileBlockCount[0] = 1;
>>   fileOffset[0] = vecInds[0];
>>   for(int ir = 1; ir < rank; ++ir) {
>>     selectionDims[ir] = dims[ir];
>>     fileBlockCount[ir] = 1;
>>     fileOffset[ir] = 0;
>>   }
>>
>>   cout << "begin hyperslab building" << endl;
>>   H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>>   unsigned long long numVecsToRead = vecInds.size();
>>   for (hsize_t id=1; id < numVecsToRead; ++id) {
>>     if ( (id % 50) == 0) {
>>       cout << id << "/" << numVecsToRead << endl;
>>     }
>>     fileOffset[0] = vecInds[id];
>>     H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>>   }
>>   cout << "end hyperslab building" << endl;
>>
>>
>>   return 0;
>> }
>>
>>
>> Thanks,
>> Ken
>>
>>
>> On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <[email protected]>
>> wrote:
>> Hi Ken,
>>
>> On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:
>>
>> > Hi, I'm running into slow performance when selecting several  (>1000)
>> non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X
>> 100.  The bottleneck is the for loop where each row vector index is
>> OR'ed into the hyperslab, i.e.:
>> >
>> >   LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out
>> with time stamp
>> >   //select file buffer hyperslabs
>> >   H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> >   for (hsize_t id = 1; id < numVecsToRead; ++id) {
>> >     LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
>> >     fileOffset[0] = fileLocs1Dim[id];
>> >     H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> >   }
>> >   LOG4CXX_INFO(logger,"TIME end hyperslab building");
>> >
>> > One interesting thing is the time between each loop increases between
>> each iteration, e.g. no time at all between 1-2-3-4-5, but seconds
>> between 1000-1001-1002. So, the time to select the hyperslab is worse
>> than linear, and can become amazingly time consuming, e.g. >10 minutes
>> (!) for a few thousand. The read itself is very quick.
>>
>>        Drat!  Sounds like we've got an O(n^2) algorithm (or worse)
>> somewhere in the code that combines two selections.  Can you send
>> us a standalone program that demonstrates the problem, so we can
>> file an issue for this, and get it fixed?
>>
>> > My current workaround is to check if the number of vectors to select
>> is greater than a heuristically determined number where it seems the
>> time to read the entire file (half a million row vectors) and copy the
>> requested vectors is less than the time to run the hyperslab
>> selection.  Generally the number works out to ~500 vecs/0.5 seconds.
>> >
>> > While poking around the code, I found a similar function,
>> H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
>> defined.  Using this significantly reduced the time of selection, in
>> particular the time for each OR-ing seemed constant, so 2000 vectors
>> took twice as long as 1000, not many times as with
>> H5Sselect_hyperslab().  However, it's still 10s of seconds for few
>> thousand vector selection, and so it's still much quicker to read all
>> and copy (~1/2 second).
>> > Reading all and copying is not an ideal solution, as it requires
>> malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab()
>> the crossover number goes up, i.e. more than 500, and it's less likely
>> to be needed.  I'm a bit nervous however about using this undocumented
>> code.
>> >
>> > So...am I doing something wrong?  Is there a speedy way to select a
>> hyperslab consisting of 100s or 1000s of non-consecutive vectors?  Is
>> NEW_HYPERSLAB_API safe?
>>
>>        Currently, the NEW_HYPERSLAB_API is not tested or supported, so I
>> wouldn't use it.
>>
>>        Quincey
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to