On 03/21/2014 08:27 AM, Sven Reiche wrote:
Hi,
I am working on the following problem:
A code is producing about 20000 dataset, which should be placed pairwise into
groups, resulting in 10000 groups with 2 datasets each. Each pair of data for
the datasets is calculated by a computer node, so only one node needs to write
to one dataset without any data from other nodes. The site of one dataset is
about 500 kByte.
500 kbyte for 20000 datasets: you're moving ~ 10 gigs of data
My approach for doing so is the following:
1) I am open the file in parallel mode
2) I am looping with all nodes over all groups and dataset: creating and
closing them collectively
3) Then I loop over the calculation of the dataset as for (int i=rank; i <
10000, i+=size){„calculate the data of the data set“}
4) One node writes exclusively into a single dataset with a transfer protocol
which is INDEPENDENT
So far the idea. I am profiling the code with MPE and it works ok for a small
number of nodes but with more nodes it gets worse, much worse, up to a point
that doing the calculation on a single node doing a serial writing, while the
remaining nodes are idle.
I am stuccoed now to get a good performance which scales nicely with the number
of cores.
Any help or tips are appreciated
Well you're pumping 10 gigs of data through one node. that's not going
to scale.
I guess you could decompose your parallel writes over the datasets, but
i'm not sure how HDF5 updates the free blocks list in that case.
Could you produce instead onf 20000 datasets one dataset with an
additional dimension called, oh, "data id" maybe of size 20000?
Some HDF5 users like "poor man's parallel I/O". I think it's a horrible
architecture but one must be pragmatic: it's a good solution to
defective file systems. Perhaps you have one of those file systems?
https://visitbugs.ornl.gov/projects/hpc-hdf5/wiki/Poor_Man%27s_vs_Rich_Mans%27_Parallel_IO
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org