On 03/21/2014 08:27 AM, Sven Reiche wrote:
Hi,

I am working on the following problem:
A code is producing about 20000 dataset, which should be placed pairwise into 
groups, resulting in 10000 groups with 2 datasets each. Each pair of data for 
the datasets is calculated by a computer node, so only one node needs to write 
to one dataset without any data from other nodes. The site of one dataset is 
about 500 kByte.


500 kbyte for 20000 datasets: you're moving ~ 10 gigs of data

My approach for doing so is the following:

1) I am open the file in parallel mode
2) I am looping with all nodes over all groups and dataset: creating and 
closing them collectively
3) Then I loop over the calculation of the dataset as  for (int i=rank; i < 
10000, i+=size){„calculate the data of the data set“}
4) One node writes exclusively into a single dataset with a transfer protocol 
which is INDEPENDENT

So far the idea. I am profiling the code with MPE and it works ok for a small 
number of nodes but with more nodes it gets worse, much worse, up to a point 
that doing the calculation on a single node doing a serial writing, while the 
remaining nodes are idle.

I am stuccoed now to get a good performance which scales nicely with the number 
of cores.

Any help or tips are appreciated

Well you're pumping 10 gigs of data through one node. that's not going to scale.

I guess you could decompose your parallel writes over the datasets, but i'm not sure how HDF5 updates the free blocks list in that case.

Could you produce instead onf 20000 datasets one dataset with an additional dimension called, oh, "data id" maybe of size 20000?

Some HDF5 users like "poor man's parallel I/O". I think it's a horrible architecture but one must be pragmatic: it's a good solution to defective file systems. Perhaps you have one of those file systems?

https://visitbugs.ornl.gov/projects/hpc-hdf5/wiki/Poor_Man%27s_vs_Rich_Mans%27_Parallel_IO

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to