Thanks for your reply Rob,

We do have the stripe size set to use all available OSTs as well as the lustre 
flags for Intel MPI.

With the Lustre optimizations turned on, we still see collective IO top out at 
1 gb/sec regardless of the number of machines. Independent scales as we would 
expect and performs as expected. We also have noticed that the code seems to 
spend a lot of time in MPI_Allreduce

I am writing to an X by Y by T dataset. Each node writes X/nodes Y by T slices 
to the dataset. These slices are sequential. So, essentially each slice is 
doing large, sequential IO to different parts of the file.

I know that using collective IO should hurt this IO somewhat, but should not 
hurt to the degree we are seeing.

We are not likely to use collective IO, but we would like to find a resource 
that explains how collective IO actually works today.

Thanks!

-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of Rob 
Latham
Sent: Thursday, June 12, 2014 11:39 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] phdf 5 independent vs collective IO



On 06/06/2014 08:51 AM, Zmick, David wrote:
> Hello All,
>
> I'm having some difficulty understanding how performance should differ 
> between independent and collective IO.
>
> At the moment, I'm trying to write regular hyperslabs that span an 
> entire 40GB dataset (writing to lustre, Intel MPI). Independent IO 
> seems to be quite a bit faster (30 second difference on 64 machines). 
> What factors might be contributing to this difference in performance?

While much of Intel MPI is based on MPICH, I cannot say for certain what lustre 
optimizations they have enabled -- if any.

First, ensure the stripe size for your lustre file is larger than the default 
of 4.  for parallel file access, you should stripe across all OSTs

It looks like Intel-MPI requires an additional environment variable to enable 
fs-specific optimizations: section 3.5.8 of the Intel MPI Reference Manual 
suggests you do the following:

* Set the I_MPI_EXTRA_FILESYSTEM environment variable to on to enable parallel 
file system support

*  Set the I_MPI_EXTRA_FILESYSTEM_LIST environment variable to "lustre" 
for the lustre-optimized driver

https://software.intel.com/sites/products/documentation/hpc/ics/icsxe2013sp1/lin/icsxe_gsg_files/How_to_Use_the_Environment_Variables_I_MPI_EXTRA_FILESYSTEM_and_I_MPI_EXTRA_FILESYSTEM_LIST.htm

If you use the "tuned for lustre" option, do you see better performance?

thanks
==rob


>
> Also, in both cases I seem to be getting a strange slowdown at 32 
> machines. In almost all my tests, 16 and 64 machines both perform 
> better than 32.


>
> Thanks! David
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgro
> up.org
> Twitter: https://twitter.com/hdf5
>

--
Rob Latham
Mathematics and Computer Science Division Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to