Re: [Hdf-forum] H5Dread performance issue with non-contiguous hyperslab selections

Elena Pourmal Wed, 07 Nov 2012 15:36:36 -0800

Chris,

This is a known problem.


We entered the issue you reported into our database to make sure it is on the 
radar for the HDF5 improvements. Unfortunately, making it to work will require 
a substantial effort for a general case, but we probably will be able to 
improve performance for some simple patterns like yours. 

We plan to rework a hyperslab selection algorithm (going from O(n^2) to O(1)) 
in HDF5 1.8.11. This should help, but there still will be the cases when 
performance is bad due to the fact that we are "touching every pixel" while 
building a general selection.

Bottom line: if you want good I/O, don't use a non-contigous selections ;-(

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org   
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



On Nov 6, 2012, at 6:37 PM, Chris LeBlanc wrote:

> Hi,
> 
> I think I've come across a performance issue with H5Dread when reading 
> non-contiguous hyperslab selections.  The use case in my software is a bit 
> complicated, so instead I came up with a small example that shows the same 
> issue.  Please let me know if I'm missing something here, it's possible that 
> a different approach could be much better.
> 
> In my example I write a 2D native int chunked dataset to an HDF5 file 
> (adapted from the h5_extend example, now writes a 229 MB file).  I then 
> construct a hyperslab selection of the dataset and read it back using a 
> single call to H5Dread.  When I use a stride of 1 (so all elements of the 
> selection are contiguous) the read is very fast.  However, when I set the 
> stride to 2 the read time slows down significantly, on the order of 15 times 
> slower.
> 
> The dataset has a chunk shape of 1000x500, and the 0th dimension is the one 
> being tested with a stride of 1 and 2.  Is this a typical slowdown seen with 
> a stride of 2?  If the chunksize is 1000, then a stride of 1 and 2 would 
> still need to read the same amount of data, so I would expect similar 
> performance.
> 
> I've run the stride of 2 scenario under Valgrind (using the callgrind tool) 
> for profiling and it shows that 95% of the time is being spent in 
> H5S_select_iterate (I can share the callgrind output if it helps), which is 
> making this program CPU bound not I/O bound.
> 
> I'm using an up to date version of HDF5 trunk from checked out from 
> subversion.  I looked at the callback H5D__chunk_io_init() used by 
> H5S_select_iterate().  I noticed that there are two different approaches 
> taken, one for the case where the shape of the memory space is the same as 
> the dataspace, and another if the shapes are different.  The performance drop 
> I've noticed appears to be for the latter case.
> 
> Any ideas on how to optimize this function or otherwise increase the 
> performance of this use case?
> 
> Thanks,
> Chris LeBlanc
> 
> --
> 
> Here is the example code.  I wrote this mail earlier and included it as an 
> attachment and haven't seen it appear on the mailing list so I'm trying again 
> with the text inline:
> 
> 
> 
> /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>  * Copyright by The HDF Group.                                               *
>  * Copyright by the Board of Trustees of the University of Illinois.         *
>  * All rights reserved.                                                      *
>  *                                                                           *
>  * This file is part of HDF5.  The full HDF5 copyright notice, including     *
>  * terms governing use, modification, and redistribution, is contained in    *
>  * the files COPYING and Copyright.html.  COPYING can be found at the root   *
>  * of the source code distribution tree; Copyright.html can be found at the  *
>  * root level of an installed copy of the electronic HDF5 document set and   *
>  * is linked from the top-level documents page.  It can also be found at     *
>  * http://hdfgroup.org/HDF5/doc/Copyright.html.  If you do not have          *
>  * access to either file, you may request a copy from [email protected].     *
>  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
> */
> 
> /*
>  *  This example how to work with extendible datasets. The dataset 
>  *  must be chunked in order to be extendible.
>  * 
>  *  It is used in the HDF5 Tutorial.
>  */
> 
> // Modified example of h5_extend.c to show performance difference between 
> reading with a stride of 1 vs 2:
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <time.h>
> #include "hdf5.h"
> 
> #define FILE        "extend.h5"
> #define DATASETNAME "ExtendibleArray" 
> #define RANK         2
> 
> void write_file() {
>     hid_t        file;                          /* handles */
>     hid_t        dataspace, dataset;  
>     hid_t        filespace, memspace;
>     hid_t        cparms;                     
> 
>     hsize_t      dims[2]  = {20000, 3000};           /* dataset dimensions at 
> creation time */      
>     hsize_t      maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
>     herr_t       status;                             
>     hsize_t      chunk_dims[2] = {1000, 500};
>     int          *data = calloc(dims[0]*dims[1], sizeof(int));
> 
>     /* Variables used in reading data back */
>     hsize_t      chunk_dimsr[2];
>     hsize_t      dimsr[2];
>     hsize_t      i, j;
>     int          *datar = calloc(dims[0]*dims[1], sizeof(int));
>     herr_t       status_n;                             
>     int          rank, rank_chunk;
> 
>     /* Create the data space with unlimited dimensions. */
>     dataspace = H5Screate_simple (RANK, dims, maxdims); 
> 
>     /* Create a new file. If file exists its contents will be overwritten. */
>     file = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
> 
>     /* Modify dataset creation properties, i.e. enable chunking  */
>     cparms = H5Pcreate (H5P_DATASET_CREATE);
>     status = H5Pset_chunk (cparms, RANK, chunk_dims);
> 
>     /* Create a new dataset within the file using cparms
>        creation properties.  */
>     dataset = H5Dcreate2 (file, DATASETNAME, H5T_NATIVE_INT, dataspace,
>                          H5P_DEFAULT, cparms, H5P_DEFAULT);
> 
>     status = H5Sclose (dataspace);
> 
>     /* Write data to dataset */
>     status = H5Dwrite (dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
>                        H5P_DEFAULT, data);
> 
>     /* Close resources */
>     status = H5Dclose (dataset);
>     status = H5Fclose (file);
>     status = H5Pclose (cparms);
>     free(data);
>    
> }
> 
> void read_file(hsize_t dim1_stride, hsize_t dim2_stride) {
> 
>     /* Variables used in reading data back */
>     hid_t        file;
>     hid_t        dataspace, dataset;  
>     hid_t        filespace, memspace;
>     hsize_t      chunk_dimsr[2];
>     hsize_t      dimsr[2];
>     hsize_t      memspace_dims[2];
>     hsize_t      i, j;
>     int          *datar;
>     hsize_t      mem_offsets[2] = {0, 0};
>     hsize_t      strides[2] = {dim1_stride, dim2_stride};
>     hsize_t      count[2];
>     herr_t       status_n;                             
>     int          rank_chunk;
> 
>     file = H5Fopen (FILE, H5F_ACC_RDONLY, H5P_DEFAULT);
>     dataset = H5Dopen2 (file, DATASETNAME, H5P_DEFAULT);
> 
>     filespace = H5Dget_space (dataset);
> 
>     //rank = H5Sget_simple_extent_ndims (filespace);
>     status_n = H5Sget_simple_extent_dims (filespace, dimsr, NULL);
> 
>     memspace_dims[0] = dimsr[0] / strides[0];
>     memspace_dims[1] = dimsr[1];
>     memspace = H5Screate_simple (RANK, memspace_dims, NULL);
>     
>     count[0] = dimsr[0] / strides[0];
>     count[1] = dimsr[1];
>     
>     // core of this test: a hyperslab with varying stride:
>     H5Sselect_hyperslab( filespace, H5S_SELECT_SET, mem_offsets, strides, 
> count, NULL );
>     
>     datar = calloc(memspace_dims[0]*memspace_dims[1], sizeof(int));
>     
>     printf("reading with stride = %d, memspace_dims: %d %d, count: %d %d\n", 
> (int) strides[0], (int) memspace_dims[0], (int) memspace_dims[1], (int) 
> count[0], (int) count[1]);
>     
>     time_t t1 = time(NULL);
>     int status = H5Dread (dataset, H5T_NATIVE_INT, memspace, filespace,
>                       H5P_DEFAULT, datar);
>     
>     time_t t2 = time(NULL);
>     printf("done reading with stride = %d, time = %d (nearest sec)\n",  (int) 
> strides[1], (int) (t2-t1) );
>     
> 
>     status = H5Dclose (dataset);
>     status = H5Sclose (filespace);
>     status = H5Sclose (memspace);
>     status = H5Fclose (file);
>     free(datar);
> }
> 
> int main (void)
> {
>     write_file();
>     read_file(1, 1);
>     read_file(2, 1);
> }     
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] H5Dread performance issue with non-contiguous hyperslab selections

Reply via email to