Hi Roger, It looks like you are also using NFS. Parallel HDF5 is not designed to work with NFS; it requires a parallel file system.
Elena On Jun 1, 2010, at 5:26 PM, Roger Martin wrote: > Hi Quincey, > > Ah, will have to design around it. Starting to understand parallel hdf5 is > for unique situations and not applicable toward coarse concurrency and > queues hitting arbitrary number of mpi processes. > > Had hoped when H5TBmake_table wasn't listed on > http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html (understandably > young document) and http://www.hdfgroup.org/HDF5/Tutor/pprog.html says > .... > Dataset Operations: Write to or read from a dataset > (Array data transfer can be colletive or independent.) > > Once a file is opened by the processes of a communicator: > Each process writes to a individual dataset. > .... > I'm catching on. Not for tables where meta-data is being added by individual > processes. My current design has the zero rank process doing a bunch of > extra initialization and then queuing work to the other processes. > > Thank you > > Quincey Koziol wrote: >> Hi Roger, >> >> On Jun 1, 2010, at 3:36 PM, Roger Martin wrote: >> >> >>> Hi, >>> >>> I'm testing 1.8.4 and 1.8.5 pre2 on CentOS 5.4 nfs v3 64 bit with shared >>> libraries (is there additional info that'd be useful ?). Both build >>> successfully with >>> ---- >>> ./configure --prefix=/net/frodo/home/roger/Software/hdf5-1.8.5-pre2/dist >>> --disable-fortran --enable-parallel >>> --with-zlib=/net/frodo/home/roger/sourceforge/zlib-1.2.4/dist >>> --without-szlib --enable-shared=yes --enable-static=no >>> >>> make >>> #make check >>> make install >>> ---- >>> >>> All processes do the collective H5Fcreate and then the rank=0 begins to >>> create tables using the H5TBmake_table method while the others wait. >>> >>> 1) It goes for 28 tables and then stalls while headed to 196 tables. 2) If >>> the chunk_size is changed from 10 to 1000 it then goes to 44 tables and >>> then stalls. >>> >> >> Are you making the H5TBmake_table() calls from only one process in a >> parallel application? If so, that's certainly the problem. >> >> Quincey >> >> >>> I'll try to make a small app that reveals the same stall but for now here >>> is the portion of code that sets up the table: >>> ---------- >>> typedef struct Particle >>> { >>> char element[3]; >>> char atomName[16]; >>> char residueName[16]; >>> double x; >>> double y; >>> double z; >>> int formalCharge; >>> } Particle; >>> >>> Particle dst_buf[elements.size()]; >>> >>> /* Calculate the size and the offsets of our struct members in memory */ >>> size_t dst_size = sizeof ( Particle); >>> size_t dst_offset[7] = {HOFFSET(Particle, element), >>> HOFFSET(Particle, atomName), >>> HOFFSET(Particle, residueName), >>> HOFFSET(Particle, x), >>> HOFFSET(Particle, y), >>> HOFFSET(Particle, z), >>> HOFFSET(Particle, formalCharge)}; >>> >>> size_t dst_sizes[7] = {sizeof ( dst_buf[0].element), >>> sizeof ( dst_buf[0].atomName), >>> sizeof ( dst_buf[0].residueName), >>> sizeof ( dst_buf[0].x), >>> sizeof ( dst_buf[0].y), >>> sizeof ( dst_buf[0].z), >>> sizeof ( dst_buf[0].formalCharge)}; >>> >>> >>> /* Define an array of Particles */ >>> Particle p_data[elements.size()]; >>> for (int index = 0; index < elements.size(); index++) >>> { >>> std::strncpy(p_data[index].element, elements[index].c_str(), 3); >>> std::strncpy(p_data[index].atomName, atomNames[index].c_str(), 16); >>> std::strncpy(p_data[index].residueName, residueNames[index].c_str(), >>> 16); >>> p_data[index].x = (*xyz)[0][index]; >>> p_data[index].y = (*xyz)[1][index]; >>> p_data[index].z = (*xyz)[2][index]; >>> p_data[index].formalCharge = formalCharge[index]; >>> } >>> >>> hsize_t chunk_size = 10; >>> int *fill_data = NULL; >>> int compress = 1; >>> if(parallel)compress = 0; >>> >>> /* Define field information */ >>> const char *field_names[7] = {"Element", "AtomName", "ResidueName", "X", >>> "Y", "Z", "FormalCharge"}; >>> /* Initialize field_type */ >>> hid_t string_element_type = H5Tcopy(H5T_C_S1); >>> hid_t string_type = H5Tcopy(H5T_C_S1); >>> H5Tset_size(string_element_type, 3); >>> H5Tset_size(string_type, 16); >>> hid_t field_type[7]; >>> field_type[0] = string_element_type; >>> field_type[1] = string_type; >>> field_type[2] = string_type; >>> field_type[3] = H5T_NATIVE_DOUBLE; >>> field_type[4] = H5T_NATIVE_DOUBLE; >>> field_type[5] = H5T_NATIVE_DOUBLE; >>> field_type[6] = H5T_NATIVE_INT; >>> >>> std::cout<<"create table "<<datasetName<<" "<<elements.size()<<std::endl; >>> herr_t status = H5TBmake_table(title.c_str(), currentFileID, >>> datasetName.c_str(), 7, elements.size(), >>> dst_size, field_names, dst_offset, field_type, >>> chunk_size, fill_data, compress, p_data); >>> std::cout<<"create table "<<status<<std::endl; >>> >>> ... >>> >>> H5Tclose( string_type ); >>> H5Tclose( string_element_type ); >>> >>> --------------- >>> >>> The only things closed during this loop of 196 table making calls is the >>> string_type and string_element_type. There are two attributes set on the >>> tables but the same stall happens without these extra attributes too. >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> [email protected] >>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
