OK, property list type corrected in the attached. With the correct property
list type, I’m getting an infinite loop closing the library (error stack
attached) if I set the avoid_truncate flag. Same code with H5P_DEFAULT works
fine.
I’ll try the collective metadata write next. Data is contiguous, but is written
out as separate hyperslabs across ranks.
Thanks!
Jarom
From: Hdf-forum [mailto:[email protected]] On Behalf Of
Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:46 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs.
H5F_ACC_EXCL
10s of datasets and groups do seem like a fairly reasonable amount of metadata
to see benefits from the collective metadata write option. If the datasets are
chunked, that means more metadata is generated too. I would try the collective
write feature and see if the file close speeds up. It does so significantly in
most scenarios we have tested it with.
Mohamad
From: Hdf-forum [mailto:[email protected]] On Behalf Of
Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:40 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs.
H5F_ACC_EXCL
Hi Jarom,
You are using the file access property list for the H5Pset_avoid_truncate call.
This requires a file creation property list.
Thanks,
Mohamad
From: Hdf-forum
<[email protected]<mailto:[email protected]>>
on behalf of "Nelson, Jarom" <[email protected]<mailto:[email protected]>>
Reply-To: hdf-forum
<[email protected]<mailto:[email protected]>>
Date: Wednesday, February 24, 2016 at 1:28 PM
To: hdf-forum
<[email protected]<mailto:[email protected]>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs.
H5F_ACC_EXCL
Testing out the H5Pset_avoid_truncate call, I get the following error when I
attempt to set this property on my file access property list:
HDF5-DIAG: Error detected in HDF5 (1.9.233) MPI-process 0:
#000: H5Pfcpl.c line 1422 in H5Pset_avoid_truncate(): can't find object for ID
major: Object atom
minor: Unable to find atom information (already closed?)
#001: H5Pint.c line 3789 in H5P_object_verify(): property list is not a
member of the class
major: Property lists
minor: Unable to register new atom
Attached is my simple test program that produces the above error.
Despite the error, my application ran to completion and the file generated
appears to work correctly, though it is a very simple test case file. However,
I suspect that the property is not being set correctly, because the change does
not seem to improve the time it takes to close the HDF5 file. Comparisons using
my non-toy program show an increase in time to write and close the file from 64
ranks to 128 ranks (~9 seconds to ~12 seconds).
Note, I’m hopefully optimistic that I built the library correctly from the
branch given. Building the avoid_truncate branch checkout from svn didn’t match
the build instructions for a release. I ended up doing the following:
### install autoconf version 2.69
./autogen.sh
CC=$(which mpicc) ./configure --enable-parallel --with-zlib
make ### this failed complaining that libtool didn’t have any targets
configured, or some similar error message
./config.status
./config.lt
make ### this now worked, I think
make check
make install
make check-install
After that, the new library linked OK with my test code, and it appears that
despite the error . TBD whether it is actually working correctly.
Jarom
From: Hdf-forum [mailto:[email protected]] On Behalf Of
Nelson, Jarom
Sent: Tuesday, February 23, 2016 2:55 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs.
H5F_ACC_EXCL
Thanks for your help. I’ll attempt to build my application using your
avoid_truncate branch and see if it helps with the truncation issue. I may ping
for assistance here, since my initial attempts to build HDF5 from the
avoid_truncate branch are coming up with some problems.
Regarding the metadata issue, I don’t have any extra metadata other than that
required for creating datasets and groups. I wouldn’t call this a “significant
amount” of metadata. We are talking tens of datasets in a handful of groups,
mostly written only by rank 0. The bulk of the data written is in one array
distributed across ranks and written out in hyperslabs in parallel. Though,
since all the Dataset and Group creation calls are collective calls, it may
amount to a much more significant amount of metadata.
Should I be concerned about the collective calls to create groups and datasets
generating a large overhead of metadata and slowing down the file write and
close? Or is it just when the application generates a large amount of extra
metadata that the metadata write can start to be a significant slowdown?
Jarom
From: Hdf-forum [mailto:[email protected]] On Behalf Of
Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs.
H5F_ACC_EXCL
Hi Jarom,
H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file
close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to
empty the file.
Now for the file close performance issue there could be several causes for
that. One of them could be the truncate issue. We actually have a patch for
that to avoid truncating the file at file close but modify the file format to
store both EOA and EOF. This does not work with the 1.8 release. Unfortunately
it won't be in the 1.10.0 release either because there are other issues in the
library that have to be resolved before this can be merged in. I do highly
anticipate that it will be in 1.10.1 though. For now you can test to see if
this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file
creation property list.
Again this is a development branch (not production) so don't keep your HDF5
files.
The second issue (more likely the cause for bad performance I believe) could be
the cost of writing out metadata at file close. If you have a significant
amount of HDF5 metadata generated by your application, writing out the metadata
at file close is very costly currently with the 1.8 release. It wasn't done in
a parallel file system friendly manner. In 1.10 we improved this by adding an
option for users to set the metadata writes done at file close be issued in one
collective MPI write call. To use that feature, you can install the HDF5 trunk
version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);
You can get HDF5 trunk here:
https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F
This feature will be in the 1.10.0 release that should be out in the next month
or so.
Thanks,
Mohamad
From: Hdf-forum
<[email protected]<mailto:[email protected]>>
on behalf of "Nelson, Jarom" <[email protected]<mailto:[email protected]>>
Reply-To: hdf-forum
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum
<[email protected]<mailto:[email protected]>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL
Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel
HDF5 when I want to overwrite the existing file?
Looking at the performance, it seems that one of the most costly actions when
running my application is closing out the file, and the top “hot call path”
includes the H5FD_mpio_truncate method, and in the comments on that function,
it indicates that keeping track of EOF is a costly operation over MPI. Looking
through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid
the overhead of the H5FD_mpio_truncate method.
I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option
used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file
using H5F_ACC_EXCL?
Am I missing something here?
Jarom Nelson
//
// Created by nelson99 on 1/5/16.
//
#include "hdf5.h"
#include <iostream>
#include <string>
#include <assert.h>
#include <mpi.h>
#include <cstdint>
#include <cstring>
/**
* @brief Simple tests to work through library integration issues with Parallel
HDF5
*/
int main(int argc, char **argv) {
try {
/*
* MPI variables
*/
int mpi_size, mpi_rank;
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Info info = MPI_INFO_NULL;
/*
* Initialize MPI
*/
MPI_Init(&argc, &argv);
MPI_Comm_size(comm, &mpi_size);
MPI_Comm_rank(comm, &mpi_rank);
std::string outfilename;
if (mpi_size > 1) {
outfilename = "h5g_output_parallel.h5";
} else {
outfilename = "h5g_output_serial.h5";
}
hid_t fapl_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(fapl_id, comm, info);
hid_t fcpl_id = H5P_DEFAULT;
// requires HDF5 version 1.10.1 or greater (or avoid_truncate subversion
trunk, but that is not merged with 1.10 for coll_metadata below)
fcpl_id = H5Pcreate(H5P_FILE_CREATE);
H5Pset_avoid_truncate(fcpl_id, H5F_AVOID_TRUNCATE_ALL);
// requires HDF5 version 1.10
// H5Pset_coll_metadata_write(plist_id, true);
hid_t file_id = H5Fcreate(outfilename.c_str(), H5F_ACC_TRUNC, fcpl_id,
fapl_id);
H5Pclose(fapl_id);
H5Pclose(fcpl_id);
hsize_t dims[1] = {1};
hid_t datatype = H5T_STD_I8LE;
std::int8_t data1[1] = {( int8_t ) (mpi_rank + 100)};
std::int8_t data2[1] = {( int8_t ) (mpi_rank - 100)};
// hid_t datatype = H5T_STD_I32LE;
// std::int32_t data[1] = {mpi_rank};
// dataspace is the same for all the datasets below
hid_t dataspace = H5Screate_simple(1, dims, dims);
// create a common group to contain distinct datasets for each rank
hid_t common_group = H5Gcreate(file_id, "common group", H5P_DEFAULT,
H5P_DEFAULT, H5P_DEFAULT);
std::cout << "rank " << mpi_rank << ": /common group/ ID: "
<< common_group << std::endl;
// do collective calls to create all the distinct datasets for each rank
// (each rank must create each dataset)
hid_t dataset_by_rank[mpi_size];
for (int i = 0; i < mpi_size; ++i) {
std::string rank_name = "rank";
rank_name += std::to_string(i);
std::cout << rank_name << std::endl;
dataset_by_rank[i] = H5Dcreate(common_group, rank_name.c_str(), datatype,
dataspace, H5P_DEFAULT, H5P_DEFAULT,
H5P_DEFAULT);
std::cout << "rank " << mpi_rank << " /common group/" << rank_name << "
ID: "
<< dataset_by_rank[i] << std::endl;
}
// set up dataset transfer property list for collective MPI I/O
hid_t xferplist = H5Pcreate(H5P_DATASET_XFER);
// H5Pset_dxpl_mpio(xferplist, H5FD_MPIO_INDEPENDENT);
H5Pset_dxpl_mpio(xferplist, H5FD_MPIO_COLLECTIVE);
// each rank writes it's own rank to the corresponding dataset for that rank
H5Dwrite(dataset_by_rank[mpi_rank], datatype, H5S_ALL, H5S_ALL, xferplist,
data1);
// collective calls to close each dataset
for (int i = 0; i < mpi_size; ++i) {
H5Dclose(dataset_by_rank[i]);
}
H5Gclose(common_group);
// do collective calls to create all the groups for every rank
// (each rank must create each group, and each dataset within each group)
hid_t group_by_rank[mpi_size];
for (int i = 0; i < mpi_size; ++i) {
std::string rank_name = "rank";
rank_name += std::to_string(i);
std::cout << rank_name << std::endl;
group_by_rank[i] = H5Gcreate(file_id, rank_name.c_str(),
H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
std::cout << "rank " << mpi_rank << " /" << rank_name << "/ ID: "
<< group_by_rank[i] << std::endl;
dataset_by_rank[i] = H5Dcreate(group_by_rank[i], "common dataset",
datatype,
dataspace, H5P_DEFAULT, H5P_DEFAULT,
H5P_DEFAULT);
std::cout << "rank " << mpi_rank << " /" << rank_name << "/common dataset
ID: "
<< dataset_by_rank[i] << std::endl;
}
// then each rank does an independent call to write data to the
corresponding dataset
H5Dwrite(dataset_by_rank[mpi_rank], datatype, H5S_ALL, H5S_ALL, xferplist,
data2);
H5Pclose(xferplist);
H5Sclose(dataspace);
for (int i = 0; i < mpi_size; ++i) {
H5Dclose(dataset_by_rank[i]);
H5Gclose(group_by_rank[i]);
}
H5Fclose(file_id);
MPI_Finalize();
} catch (std::exception &e) {
std::cerr << "std::exception thrown:" << e.what() << std::endl;
return -1;
} catch (int e) {
std::cerr << "Unrecognized error thrown" << e << std::endl;
return e ? e : -1;
}
return 0;
}
HDF5: infinite loop closing library
L,T_top,P,P,AC,FD,E,SL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL
h5g_parallel:25191 terminated with signal 6 at PC=2aaaacf01625 SP=7fffffffc6f8.
Backtrace:
/lib64/libc.so.6(gsignal+0x35)[0x2aaaacf01625]
/lib64/libc.so.6(abort+0x175)[0x2aaaacf02e05]
/g/g19/nelson99/lib/hdf5-1.10-notrunc/lib/libhdf5.so.6(H5_term_library+0x18f7)[0x2aaaaaf32537]
/g/g19/nelson99/lib/hdf5-1.10-notrunc/lib/libhdf5.so.6(+0x42099)[0x2aaaaaf33099]
/usr/local/tools/mvapich2-gnu-debug-2.1/lib/libmpi.so.12(MPIR_Attr_delete_c_proxy+0x85)[0x2aaaac07833d]
/usr/local/tools/mvapich2-gnu-debug-2.1/lib/libmpi.so.12(MPIR_Call_attr_delete+0x7a)[0x2aaaac077dcd]
/usr/local/tools/mvapich2-gnu-debug-2.1/lib/libmpi.so.12(MPIR_Attr_delete_list+0x9d)[0x2aaaac07814e]
/usr/local/tools/mvapich2-gnu-debug-2.1/lib/libmpi.so.12(PMPI_Finalize+0xb7)[0x2aaaabfe6789]
./h5g_parallel(main+0x806)[0x4023c6]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x2aaaaceedd5d]
./h5g_parallel[0x4019b9]
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5