FYI... I had thought that I was experiencing a very similar problem under
Linux.
As my loop progressed, my performance writing via hyperslab grew worse and
worse.
After further troubleshooting and profiling I discovered that I had missed
some
H5Dclose() and H5Aclose() calls. Fixing that made a HUGE difference in my
test case (from 52 secs to approximately 2.5 secs for about 8Mb of data).
Kirk
> Hi Ken,
>
> On May 27, 2010, at 12:20 PM, Ken Sullivan wrote:
>
>> Hi, sorry to not get back sooner, I've found a couple of other
>> interesting things. The speed issue doesn't seem to exist running in
>> linux, the same code runs in a blink. In windows, the really, really
>> slow runs (several minutes) only seem to happen when running from within
>> visual studios. When run from command line it's slow, e.g. 15 seconds
>> for 4000 vectors, but not minutes slow, and the time doesn't seem to
>> grow as I saw before with visual studios.
>
> Hmm, OK, I'll note that in our bug report. Sounds pretty Windows
> specific though...
>
> Quincey
>
>
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>> #include <hdf5.h>
>> #ifdef __cplusplus
>> }
>> #endif
>> #include <vector>
>> #include <iostream>
>> #include <stdlib.h>
>> #include <math.h>
>>
>> using namespace std;
>>
>> int main() {
>> unsigned long long totalNumVecs = 500000;
>> unsigned long long vecLength = 128;
>> hid_t baseType = H5T_NATIVE_FLOAT;
>>
>> unsigned long long roughNumVecsToGet = 4000;
>> unsigned long long skipRate = (unsigned long
>> long)ceilf((float)totalNumVecs / (float)roughNumVecsToGet);
>> vector<unsigned long long> vecInds;
>> for( int rowInd = 0; rowInd < totalNumVecs; rowInd += skipRate) {
>> vecInds.push_back(rowInd);
>> }
>>
>> int rank = 2;
>> hsize_t dims[2];
>> dims[0] = totalNumVecs;
>> dims[1] = vecLength;
>> hid_t fileSpaceId = H5Screate_simple(rank, dims, NULL);
>>
>> hsize_t fileBlockCount[2];
>> hsize_t fileOffset[2];
>>
>> hsize_t selectionDims[2];
>> selectionDims[0] = 1;
>> fileBlockCount[0] = 1;
>> fileOffset[0] = vecInds[0];
>> for(int ir = 1; ir < rank; ++ir) {
>> selectionDims[ir] = dims[ir];
>> fileBlockCount[ir] = 1;
>> fileOffset[ir] = 0;
>> }
>>
>> cout << "begin hyperslab building" << endl;
>> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> unsigned long long numVecsToRead = vecInds.size();
>> for (hsize_t id=1; id < numVecsToRead; ++id) {
>> if ( (id % 50) == 0) {
>> cout << id << "/" << numVecsToRead << endl;
>> }
>> fileOffset[0] = vecInds[id];
>> H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> }
>> cout << "end hyperslab building" << endl;
>>
>>
>> return 0;
>> }
>>
>>
>> Thanks,
>> Ken
>>
>>
>> On Wed, May 26, 2010 at 8:17 AM, Quincey Koziol <[email protected]>
>> wrote:
>> Hi Ken,
>>
>> On May 25, 2010, at 5:36 PM, Ken Sullivan wrote:
>>
>> > Hi, I'm running into slow performance when selecting several (>1000)
>> non-consecutive rows from a 2-dimensional matrix, typically ~500,000 X
>> 100. The bottleneck is the for loop where each row vector index is
>> OR'ed into the hyperslab, i.e.:
>> >
>> > LOG4CXX_INFO(logger,"TIME begin hyperslab building"); //print out
>> with time stamp
>> > //select file buffer hyperslabs
>> > H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_SET, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> > for (hsize_t id = 1; id < numVecsToRead; ++id) {
>> > LOG4CXX_INFO(logger, id << "/" << numVecsToRead);
>> > fileOffset[0] = fileLocs1Dim[id];
>> > H5Sselect_hyperslab(fileSpaceId, H5S_SELECT_OR, (const hsize_t*)
>> fileOffset, NULL, (const hsize_t*) fileBlockCount, selectionDims);
>> > }
>> > LOG4CXX_INFO(logger,"TIME end hyperslab building");
>> >
>> > One interesting thing is the time between each loop increases between
>> each iteration, e.g. no time at all between 1-2-3-4-5, but seconds
>> between 1000-1001-1002. So, the time to select the hyperslab is worse
>> than linear, and can become amazingly time consuming, e.g. >10 minutes
>> (!) for a few thousand. The read itself is very quick.
>>
>> Drat! Sounds like we've got an O(n^2) algorithm (or worse)
>> somewhere in the code that combines two selections. Can you send
>> us a standalone program that demonstrates the problem, so we can
>> file an issue for this, and get it fixed?
>>
>> > My current workaround is to check if the number of vectors to select
>> is greater than a heuristically determined number where it seems the
>> time to read the entire file (half a million row vectors) and copy the
>> requested vectors is less than the time to run the hyperslab
>> selection. Generally the number works out to ~500 vecs/0.5 seconds.
>> >
>> > While poking around the code, I found a similar function,
>> H5Scombine_hyperslab() that is only compiled if NEW_HYPERSLAB_API is
>> defined. Using this significantly reduced the time of selection, in
>> particular the time for each OR-ing seemed constant, so 2000 vectors
>> took twice as long as 1000, not many times as with
>> H5Sselect_hyperslab(). However, it's still 10s of seconds for few
>> thousand vector selection, and so it's still much quicker to read all
>> and copy (~1/2 second).
>> > Reading all and copying is not an ideal solution, as it requires
>> malloc/free ~250MB unnecessarily, and if I use H5Scombine_hyperslab()
>> the crossover number goes up, i.e. more than 500, and it's less likely
>> to be needed. I'm a bit nervous however about using this undocumented
>> code.
>> >
>> > So...am I doing something wrong? Is there a speedy way to select a
>> hyperslab consisting of 100s or 1000s of non-consecutive vectors? Is
>> NEW_HYPERSLAB_API safe?
>>
>> Currently, the NEW_HYPERSLAB_API is not tested or supported, so I
>> wouldn't use it.
>>
>> Quincey
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org