Re: [Hdf-forum] parallel H5TBmake_table stalls after 28 calls

Elena Pourmal Tue, 01 Jun 2010 20:29:22 -0700

Hi Roger,

It looks like you are also using NFS. Parallel HDF5 is not designed to work 
with NFS; it requires a parallel file system.


Elena 
On Jun 1, 2010, at 5:26 PM, Roger Martin wrote:

> Hi Quincey,
> 
> Ah, will have to design around it.  Starting to understand parallel hdf5 is 
> for unique situations and not applicable toward  coarse concurrency and 
> queues hitting arbitrary number of mpi processes.
> 
> Had hoped when H5TBmake_table wasn't listed on 
> http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html (understandably 
> young document) and http://www.hdfgroup.org/HDF5/Tutor/pprog.html says
> ....
> Dataset Operations:   Write to or read from a dataset
> (Array data transfer can be colletive or independent.)
> 
> Once a file is opened by the processes of a communicator:
> Each process writes to a individual dataset.
> ....
> I'm catching on.  Not for tables where meta-data is being added by individual 
> processes.  My current design has the zero rank process doing a bunch of 
> extra initialization and then queuing work to the other processes.
> 
> Thank you
> 
> Quincey Koziol wrote:
>> Hi Roger,
>> 
>> On Jun 1, 2010, at 3:36 PM, Roger Martin wrote:
>> 
>>  
>>> Hi,
>>> 
>>> I'm testing 1.8.4 and 1.8.5 pre2 on CentOS 5.4 nfs v3 64 bit with shared 
>>> libraries (is there additional info that'd be useful ?).  Both build 
>>> successfully with
>>> ----
>>> ./configure --prefix=/net/frodo/home/roger/Software/hdf5-1.8.5-pre2/dist  
>>> --disable-fortran --enable-parallel 
>>> --with-zlib=/net/frodo/home/roger/sourceforge/zlib-1.2.4/dist 
>>> --without-szlib --enable-shared=yes --enable-static=no
>>> 
>>> make
>>> #make check
>>> make install
>>> ----
>>> 
>>> All processes do the collective H5Fcreate and then the rank=0 begins to 
>>> create tables using the H5TBmake_table method while the others wait.
>>> 
>>> 1) It goes for 28 tables and then stalls while headed to 196 tables. 2) If 
>>> the chunk_size is changed from 10 to 1000 it then goes to 44 tables and 
>>> then stalls.
>>>    
>> 
>>      Are you making the H5TBmake_table() calls from only one process in a 
>> parallel application?  If so, that's certainly the problem.
>> 
>>      Quincey
>> 
>>  
>>> I'll try to make a small app that reveals the same stall but for now here 
>>> is the portion of code that sets up the table:
>>> ----------
>>>  typedef struct Particle
>>>  {
>>>      char element[3];
>>>      char atomName[16];
>>>      char residueName[16];
>>>      double x;
>>>      double y;
>>>      double z;
>>>      int formalCharge;
>>>  } Particle;
>>> 
>>>  Particle dst_buf[elements.size()];
>>> 
>>>  /* Calculate the size and the offsets of our struct members in memory */
>>>  size_t dst_size = sizeof ( Particle);
>>>  size_t dst_offset[7] = {HOFFSET(Particle, element),
>>>      HOFFSET(Particle, atomName),
>>>      HOFFSET(Particle, residueName),
>>>      HOFFSET(Particle, x),
>>>      HOFFSET(Particle, y),
>>>      HOFFSET(Particle, z),
>>>      HOFFSET(Particle, formalCharge)};
>>> 
>>>  size_t dst_sizes[7] = {sizeof ( dst_buf[0].element),
>>>      sizeof ( dst_buf[0].atomName),
>>>      sizeof ( dst_buf[0].residueName),
>>>      sizeof ( dst_buf[0].x),
>>>      sizeof ( dst_buf[0].y),
>>>      sizeof ( dst_buf[0].z),
>>>      sizeof ( dst_buf[0].formalCharge)};
>>> 
>>> 
>>>  /* Define an array of Particles */
>>>  Particle p_data[elements.size()];
>>>  for (int index = 0; index < elements.size(); index++)
>>>  {
>>>      std::strncpy(p_data[index].element, elements[index].c_str(), 3);
>>>      std::strncpy(p_data[index].atomName, atomNames[index].c_str(), 16);
>>>      std::strncpy(p_data[index].residueName, residueNames[index].c_str(), 
>>> 16);
>>>      p_data[index].x = (*xyz)[0][index];
>>>      p_data[index].y = (*xyz)[1][index];
>>>      p_data[index].z = (*xyz)[2][index];
>>>      p_data[index].formalCharge = formalCharge[index];
>>>  }
>>> 
>>>  hsize_t chunk_size = 10;
>>>  int *fill_data = NULL;
>>>  int compress = 1;
>>>  if(parallel)compress = 0;
>>> 
>>>  /* Define field information */
>>>  const char *field_names[7] = {"Element", "AtomName", "ResidueName", "X", 
>>> "Y", "Z", "FormalCharge"};
>>>  /* Initialize field_type */
>>>  hid_t string_element_type = H5Tcopy(H5T_C_S1);
>>>  hid_t string_type = H5Tcopy(H5T_C_S1);
>>>  H5Tset_size(string_element_type, 3);
>>>  H5Tset_size(string_type, 16);
>>>  hid_t field_type[7];
>>>  field_type[0] = string_element_type;
>>>  field_type[1] = string_type;
>>>  field_type[2] = string_type;
>>>  field_type[3] = H5T_NATIVE_DOUBLE;
>>>  field_type[4] = H5T_NATIVE_DOUBLE;
>>>  field_type[5] = H5T_NATIVE_DOUBLE;
>>>  field_type[6] = H5T_NATIVE_INT;
>>> 
>>>  std::cout<<"create table "<<datasetName<<" "<<elements.size()<<std::endl;
>>>  herr_t status = H5TBmake_table(title.c_str(), currentFileID, 
>>> datasetName.c_str(), 7, elements.size(),
>>>          dst_size, field_names, dst_offset, field_type,
>>>          chunk_size, fill_data, compress, p_data);
>>>  std::cout<<"create table "<<status<<std::endl;
>>> 
>>> ...
>>> 
>>>  H5Tclose( string_type );
>>>  H5Tclose( string_element_type );
>>> 
>>> ---------------
>>> 
>>> The only things closed during this loop of 196 table making calls is the 
>>> string_type and string_element_type.  There are two attributes set on the 
>>> tables but the same stall happens without these extra attributes too.
>>> 
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>    
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>> 
>>  
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] parallel H5TBmake_table stalls after 28 calls

Reply via email to