Hi Elena,

A simple code demonstrating this issue is attached. Please try to modify
the variables "NGroup, LibVerLow, LibVerLow". NGroup gives the number of
groups for a fixed number of datasets (NDataset), and the other two
variables specify the file format. The size of each dataset is ~2 KB.

I tried four different cases, with the combination of NGroup=1 or 128 and
LibVerLow=H5F_LIBVER_EARLIEST or H5F_LIBVER_18. For NGroup=1, the I/O
bandwidth drops dramatically when the file size exceeds ~ 3.4 GB. For
NGroup=128, the bandwidth becomes reasonable. The results are similar for
different LibVerLow (actually the results are a bit worse for H5F_LIBVER_18
and H5F_LIBVER_LATEST than for H5F_LIBVER_EARLIEST ).

Some system spec:
HDF5 version: 1.8.16
CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
File system: gpfs
OS: CentOS release 6.7

Sincerely,
Justin

2016-02-19 17:41 GMT-06:00 Elena Pourmal <[email protected]>:

> Justin,
>
> Will it be possible for you to provide a program that illustrates the
> problem? Which version of the library are you using? On which system are
> you running your application?
>
> Thank you!
>
> Elena
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Elena Pourmal  The HDF Group  http://hdfgroup.org
> 1800 So. Oak St., Suite 203, Champaign IL 61820
> 217.531.6112
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
> On Feb 19, 2016, at 4:03 PM, Hsi-Yu Schive <[email protected]> wrote:
>
> Thanks for the suggestion. The performance I reported was measured using
> the earliest file format (i.e., H5F_LIBVER_EARLIEST). I just tried to use
> H5F_LIBVER_18, but it leads to an even worse performance. The bandwidth
> starts to drop when N > ~ 0.5 million. Using H5F_LIBVER_LATEST does not
> help either.
>
> Justin
>
> 2016-02-19 8:26 GMT-06:00 Gerd Heber <[email protected]>:
>
>> Are you using the latest version of the file format? In other words, are
>> you using H5P_DEFAULT (-> earliest)
>>
>> as your file access property list, or have you created one which sets the
>> library version bounds to H5F_LIBVER_18?
>>
>>
>>
>> See
>> https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds
>>
>>
>>
>> In the newer version, groups with large numbers of links and attributes
>> are managed more.
>>
>>
>>
>> Does that solve your problem?
>>
>>
>>
>> Best, G.
>>
>>
>>
>>
>>
>> *From:* Hdf-forum [mailto:[email protected]] *On
>> Behalf Of *Hsi-Yu Schive
>> *Sent:* Thursday, February 18, 2016 2:36 PM
>> *To:* [email protected]
>> *Subject:* [Hdf-forum] I/O bandwidth drops dramatically and
>> discontinuously for a large number of small datasets
>>
>>
>>
>> I encounter a sudden drop of I/O bandwidth when the number of datasets in
>> a single group exceeds around 1.7 million. In the following I describe the
>> issue in more detail.
>>
>>
>>
>> I'm converting an adaptive mesh refinement data to HDF5 format. Each
>> dataset contains a small 4-D array with a size of ~ 10 KB in the compact
>> format. All datasets are stored in the same group. When the total number of
>> datasets (N) is smaller than ~ 1.7 million, I get an I/O bandwidth of ~100
>> MB/s, which is acceptable. However, when N exceeds ~ 1.7 million, the
>> bandwidth suddenly drops by at least one to two orders of magnitude.
>>
>>
>>
>> This issue seems to relate to the **number of datasets per group**
>> instead of total data size. For example, if I reduce the size of each
>> dataset by a factor of 5 (so ~2 KB per dataset), the I/O bandwidth stills
>> drops when N > ~ 1.7 million, even though the total data size is reduced by
>> a factor of 5.
>>
>>
>>
>> So I was wondering what causes this issue, and if there is any simple
>> solution to that. Since the data stored in different datasets are
>> independent to each other, I prefer not to combine them into a larger
>> dataset. My current solution is to further create several HDF5 sub-groups
>> under the main group, and then distribute all datasets evenly in these
>> sub-groups (so that the number of datasets per group becomes smaller). By
>> doing so the I/O bandwidth becomes stable even when N > 1.7 million.
>>
>>
>>
>> If necessary, I can post a simplified code to reproduce this issue.
>>
>>
>>
>> Hsi-Yu
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
#include "hdf5.h"
#include "sys/time.h"

int main()
{

// input parameters
   const int  NDataset          = 128*128*128;        // total number of datasets
   const int  NGroup            = 128;                // total number of groups
   const int  NDatasetPerGroup  = NDataset/NGroup;    // number of datasets per group
   const int  N                 = 8;                  // V*N^3*sizeof(float) = size of each dataset
   const int  V                 = 1;
   const char FileName[]        = "Data.h5";          // output filename

// HDF5 file format (low and high)
   const H5F_libver_t LibVerLow = H5F_LIBVER_EARLIEST;
// const H5F_libver_t LibVerLow = H5F_LIBVER_18;
// const H5F_libver_t LibVerLow = H5F_LIBVER_LATEST;

// const H5F_libver_t LibVerHigh = H5F_LIBVER_18;
   const H5F_libver_t LibVerHigh = H5F_LIBVER_LATEST;


   hid_t      file_id, group_id, dataset_id, dataspace_id, fapl;
   H5G_info_t ginfo;
   hsize_t    dims[4];
   herr_t     status;
   timeval    tv1, tv2;
   char       SetName[100], GroupName[100];
   float      (*dset_data)[N][N][N] = new float [5][N][N][N];
   float      Time, SizeMB;

   
   /* Initialize the dataset. */
   for (int v=0; v<V; v++)
   for (int k=0; k<N; k++)
   for (int j=0; j<N; j++)
   for (int i=0; i<N; i++)    dset_data[v][k][j][i] = (((float)v*N+k)*N+j)*N+i;

   printf( "Data[First]  = %14.7e\n", dset_data[  0][  0][  0][  0] );
   printf( "Data[Last ]  = %14.7e\n", dset_data[V-1][N-1][N-1][N-1] );
   printf( "\n" );
   fflush( stdout );

   SizeMB = (float)NDataset*V*N*N*N*sizeof(float)/1024./1024.;
   printf( "NDataset         = %10d\n", NDataset );
   printf( "NGroup           = %10d\n", NGroup );
   printf( "NDatasetPerGroup = %10d\n", NDatasetPerGroup );
   printf( "Data size        = %13.7e MB\n", SizeMB );
   fflush( stdout );


   gettimeofday( &tv1, NULL );
   
   /* Create file with the specified format. */
   fapl    = H5Pcreate( H5P_FILE_ACCESS );
   status  = H5Pset_libver_bounds( fapl, LibVerLow, LibVerHigh );
   file_id = H5Fcreate( FileName, H5F_ACC_TRUNC, H5P_DEFAULT, fapl );
   
   /* Create the data space for the dataset. */
   dims[0] = V;
   dims[1] = N;
   dims[2] = N;
   dims[3] = N;
   dataspace_id = H5Screate_simple( 4, dims, NULL );

   for (int g=0; g<NGroup; g++)
   {
      /* Set a group name. */
      sprintf( GroupName, "/group_%d%d%d%d%d%d%d%d%d", g/100000000
                                                     ,(g%100000000)/10000000
                                                     ,(g%10000000 )/1000000
                                                     ,(g%1000000  )/100000
                                                     ,(g%100000   )/10000
                                                     ,(g%10000    )/1000
                                                     ,(g%1000     )/100
                                                     ,(g%100      )/10
                                                     ,(g%10       )          );
      /* Create a group */
      group_id = H5Gcreate2( file_id, GroupName, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT );

      for (int t=0; t<NDatasetPerGroup; t++)
      {
         /* Set a dataset name. */
         sprintf( SetName, "%s/dset_%d%d%d%d%d%d%d%d%d", 
                  GroupName 
                , t/100000000
                ,(t%100000000)/10000000
                ,(t%10000000 )/1000000
                ,(t%1000000  )/100000
                ,(t%100000   )/10000
                ,(t%10000    )/1000
                ,(t%1000     )/100
                ,(t%100      )/10
                ,(t%10       )          );

         /* Create a dataset. */
         dataset_id = H5Dcreate2( file_id, SetName, H5T_NATIVE_FLOAT, dataspace_id,
                                  H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT );
         
         /* Write the dataset. */
         status = H5Dwrite( dataset_id, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data );
         
         /* Close the dataset. */
         status = H5Dclose( dataset_id );
      } // for (int t=0; t<NDatasetPerGroup; t++)

      /* Obtain the group info and print the group storage type of the last group. */
      if ( g == NGroup-1 )
      {
         status = H5Gget_info( group_id, &ginfo );
         printf( "\nGroup storage type is: " );

         switch ( ginfo.storage_type )
         {
            case H5G_STORAGE_TYPE_COMPACT:        printf("H5G_STORAGE_TYPE_COMPACT\n");        break;
            case H5G_STORAGE_TYPE_DENSE:          printf("H5G_STORAGE_TYPE_DENSE\n");          break;
            case H5G_STORAGE_TYPE_SYMBOL_TABLE:   printf("H5G_STORAGE_TYPE_SYMBOL_TABLE\n");   break;
         }
         printf( "\n" );
      }

      /* Close the group. */
      status = H5Gclose( group_id );
   } // for (int g=0; g<NGroup; g++)

   /* Close the data space for the dataset. */
   status = H5Sclose( dataspace_id );
   
   /* Close the file. */
   status = H5Fclose( file_id );

   gettimeofday( &tv2, NULL );

   Time = ( ( tv2.tv_sec*1000000 + tv2.tv_usec ) - ( tv1.tv_sec*1000000 + tv1.tv_usec ) )*1.0e-6;

   printf( "Time             = %13.7e sec\n", Time );
   printf( "Bandwidth        = %13.7e MB/sec\n", SizeMB/Time );

   delete [] dset_data;

}
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to