Just to follow up, I tried switching to collective IO instead, and saw a 2x
speedup.  I also played around with chunking and chunk cache parameters a
fair bit, and only ever got it to go maybe 1.5x faster by doing that.

After a little more searching I found this investigation (
http://confluence.diamond.ac.uk/display/HDF5DEV/HDF5+writer+performance+review)
that saw similarly bad performance for a few benchmarks when using MPI.  I
also found this presentation (
https://www.nersc.gov/assets/Uploads/Readingand-WritingDataNUGFeb2012.ppt)
which observed similar performance with MPIIO.  It also led me to adjust
the OST filesystem striping parameters for the directory I'm creating the
file in, which helped be up the performance to around 200 MB/s.

At this point I'm thinking that it's probably not HDF5 that is my problem,
but the MPIIO implementation and the filesystem itself.

Nick





On Sun, Dec 21, 2014 at 1:47 AM, Nick Horelik <[email protected]> wrote:

> Hello,
>
> I am updating a Monte Carlo particle transport code written in Fortran to
> be run on Titan (http://en.wikipedia.org/wiki/Titan_%28supercomputer%29),
> and I'm having some trouble achieving good performance writing to many
> small sections of large datasets with independent I/O.  Monitoring with du,
> it looks like I'm writing only about 50 MB/s while creating a 78GB file.  I
> was hoping to get much better from the Lustre filesystem.
>
> For my current case I'm trying to write to 3000 1D datasets in the same
> file containing 3,534,960 doubles each, where each individual write will
> always be a contiguous 1D chunk of 206 doubles.  These writes are coming
> from 980 different MPI processes, and each chunk of 206 that needs to be
> written could be in any of the 3000 datasets, anywhere within the large 1D
> array.  I'm doing things in batches, where on each MPI process I collect a
> bunch of these chunks of writes on each process before doing a round of
> writing.
>
> It's trivial to calculate where in each of the large datasets each chunk
> should be written to, so at first I just looped through each chunk to write
> to and selected the appropriate hyperslab before writing.  I figured this
> might be slow because it was jumping all around the datasets for each
> little write, so I switched to a scheme where I first sort the writes in
> the order they would appear in the dataset, and then create a large union
> of irregularly-spaced hyperslabs and do one big write, similar to the
> difference discussed here:
> http://hdf-forum.184993.n3.nabble.com/reading-multiple-irregularly-spaced-hyperslabs-td426929.html
> .
>
> This gave me about a 2 or 3x speedup from where I was, but I'm still
> pretty unsatisfied with these write speeds.  I'm still a little new to
> HDF5, so I'm worried I might be missing something fundamental.  I've done a
> little playing around with setting the chunking on these datasets to
> multiples of 206, but I haven't had any success there.
>
> Is there a way to accomplish the write pattern I described more
> efficiently than what I'm doing?  Example code below showing what I'm doing
> now.  This is with hdf5 1.8.11.
>
> Thanks in advance for any advice!
>
> Nick
>
> ---
>
> integer(HID_T) :: file_id, group_id, dset, dspace, plist
> integer(HSIZE_T) :: dims(1), block(1), count(1), start(1)
>
> ! All the datasets are created up front once
>
> dims(1) = 3534960
> do i = 1, 3000
>   call h5fopen_f(filename, H5F_ACC_RDWR_F, file_id, hdf5_err)
>   call h5gopen_f(file_id, groupname(i), group_id, hdf5_err)
>   call h5screate_simple_f(1, dims, dspace, hdf5_err)
>   call h5dcreate_f(group_id, 'data', H5T_NATIVE_DOUBLE, dspace, dset,
> hdf5_err)
>   call h5dclose_f(dset, hdf5_err)
>   call h5sclose_f(dspace, hdf5_err)
>   call h5gclose_f(group_id, hdf5_err)
>   call h5fclose_f(file_id, hdf5_err)
> end do
>
> ...
>
> ! The rest is run on each MPI process, which has n_chunks chunks of 206 to
> write
> ! The starting index in the large dataset for each chunk j is stored in
> chunk_starts(j)
> ! The chunks themselves are stored in one large contiguous array of
> doubles: chunks
>
> ! Setup file access property list with parallel I/O access
> call h5pcreate_f(H5P_FILE_ACCESS_F, plist, hdf5_err)
> call h5pset_fapl_mpio_f(plist, MPI_COMM_WORLD, MPI_INFO_NULL, hdf5_err)
>
> ! Open the file
> call h5fopen_f(filename, H5F_ACC_RDWR_F, file_id, hdf5_err, access_prp =
> plist)
>
> ! Close property list
> call h5pclose_f(plist, hdf5_err)
>
> ! Create the property list to describe independent parallel I/O
> call h5pcreate_f(H5P_DATASET_XFER_F, plist, hdf5_err)
> call h5pset_dxpl_mpio_f(plist, H5FD_MPIO_INDEPENDENT_F, hdf5_err)
>
> do i = 1, 3000
>
>   block = 206
>   count = 1
>
>   ! Open the group
>   call h5gopen_f(file_id, groupname(i), group_id, hdf5_err)
>
>   ! Open the dataset
>   call h5dopen_f(group_id, 'data', dset, hdf5_err)
>
>   ! Open the dataspace and memory space
>   call h5dget_space_f(dset, dspace, hdf5_err)
>   call h5screate_simple_f(1, block * n_chunks, memspace, hdf5_err)
>
>   ! Select the irregular hyperslab
>   call h5sselect_none_f(dspace, hdf5_err)
>   do j = 1, n_chunks
>
>     idx = chunk_start(j)
>     start = (idx - 1) * block
>
>     ! Select the hyperslab
>     call h5sselect_hyperslab_f(dspace, H5S_SELECT_OR_F, start, &
>         count, hdf5_err, block = block)
>
>   end do
>
>   ! Write the data
>   chunks = chunks_for_group(i)
>   f_ptr = c_loc(chunks(1))
>   call h5dwrite_f(dset, H5T_NATIVE_DOUBLE, f_ptr, hdf5_err, &
>       file_space_id = dspace, mem_space_id = memspace, &
>       xfer_prp = plist)
>
>   ! Close the dataspace and memory space
>   call h5sclose_f(dspace, hdf5_err)
>   call h5sclose_f(memspace, hdf5_err)
>
> end do
>
> ! Close the group and file
> call h5gclose_f(group_id, hdf5_err)
> call h5fclose_f(file_id, hdf5_err)
>



-- 
774.208.2168
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to