You should take a lot things out of the while loop for better performance, such as
H5Gopen(), H5Dopen(), H5Screate_simple.

Thanks
--pc


Paul Zumbo wrote:
Hello,

I have a integer dataset, say 500 million by 1.

Each row in that dataset represents a discrete point.
I read in a file which contains both points and values for *some* of those 
points in the dataset.

I want to write, for each point in my file, a value in the dataset.

my current code is similar to what follows,

while (fgets (line, LINE_MAX, baseOut) != NULL)
                        {
                                tc = strtok(line, sep);
                                if (strcmp(tc, "chr") == 0)
                                {
                                        continue;
                                }
                                tl = strtok(NULL, sep);
                                start [0] = atoi(tl);
                                start [1] = 0;
                                tl = strtok(NULL, sep);
                                group = H5Gopen(file, tc, H5P_DEFAULT);
                                dataset = H5Dopen2(group, "positions", 
H5P_DEFAULT);
                                count [0] = 1;
                                count [1] = 1;
                                dims[0] = 1;
                                dims[1] = 1;
                                tl = strtok(NULL, sep);
                                data[0][0][0] = atof(tl);
memspace = H5Screate_simple (3, dims, NULL); dataspace = H5Dget_space (dataset);
                                status = H5Sselect_hyperslab (dataspace, 
H5S_SELECT_SET, start, NULL,
                                                                                
                  count, NULL);
                                status = H5Dwrite (dataset, H5T_NATIVE_FLOAT, 
memspace,
                                                                           
dataspace, H5P_DEFAULT, data);
                                status = H5Sclose(memspace);
                                status = H5Sclose(dataspace);
                                status = H5Dclose(dataset);
                                status = H5Gclose(group);
                        }
        }
                status = H5Fclose(file);
        fclose(baseOut);

it basically loops through the file, opens the correct group, opens the correct 
dataset, reads a point/value, writes the point/value to the data, closes the 
group, closes the dataset, etc.
the problem is the write speed --  i recently tested out reading a 4.4G file 
and writing to a dataset, and it took about 55 minutes using 1 core on a lustre 
filesystem (i'm not sure these are in par with the expectations..?).
i think my implementation is unideal, since it writes one point at a time; but 
the datapoints are always non-contiguous.

any advice for speedup?

Thanks,
-Paul


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to