Thanks to Adam and Andreas for providing their experiences
with MPI-IO! Adam, I would also be interested in getting
a few more details (# of processes, file size, etc.) if
you wouldn't mind.

Andreas, I am not sure what the"write(%dev)(%opt) read(%dev)(%opt)"
means in the results you provided. In particular, what is
%dev?

Thanks again!

Phil Dickens


On Mon, 9 Jul 2007, Andreas Dilger wrote:

On Jul 09, 2007  13:53 -0600, Adam Boggs wrote:
We've been testing lustre 1.6.0.1 with MPI-IO (using the mpi-io-test
benchmark that comes with pvfs2 and NCAR's POP-IO test) on our BlueGene
and have seen very poor performance...  ~10 MB/s.  IOR shows similar
results writing to the same file.  Telling IOR to run on different files
shows great scalability, but our MPI-IO apps don't work that way.  We've
also seen similar MPI-IO performance at livermore running with lustre
1.4 on a regular linux cluster (10-20 MB/s).  We don't see this with the
same tests running on our gpfs or pvfs2 file systems.

I haven't tracked this down too far yet, so if anyone has suggestions of
things to check, or similar/different experiences, I'd love to hear about
it.  The IO sizes seem reasonably large (~1MB) and there don't appear to
be any client evictions.  I did try the -o localflock mount option patch
for 1.6 since I know MPI-IO flocks regions, but haven't had a chance to
fully benchmark that yet.

Hi Adam, can you post more specifics of your benchmark?  In our testing
at LLNL we didn't see any such problems with IOR, though it is possible
that the IOR parameters are "too well behaved" or something.  The one
critical issue is that for shared-file tests you need to set the striping
on the file (or parent directory) to be across all OSTs.  We are working
to integrate this properly with MPIIO so that it creates the output files
with the wide striping for shared-file output automatically.

For example, some results from overnight testing (32 OSTs, Elan3 + GigE):

POSIX, file-per-process:
----------------------------------------------------------------------
tasks  stripe    xfer bytes/              rates (MB/s)          sample
(CPUs) ct  size  size task   write(%dev)(%opt) read(%dev)(%opt) count
----------------------------------------------------------------------
  2(2)  2    1M    2M    2G     146( 3)(20)      248(30)(34)      4
 32(2)  2    1M    2M    2G    1656( 1)(14)     1934( 1)(17)      4
256(2)  2    1M    2M    2G    3075( 1)(25)     2104( 3)(17)      4
770(2)  2    1M    2M    2G    2991( 0)(24)     2031( 0)(16)      1

POSIX, single-shared-file:
----------------------------------------------------------------------
tasks  stripe    xfer bytes/              rates (MB/s)          sample
(CPUs) ct  size  size task   write(%dev)(%opt) read(%dev)(%opt) count
----------------------------------------------------------------------
  2(2) 32    1M    2M    2G      93( 0)( 2)      148( 2)( 3)      4
 32(2) 32    1M    2M    2G    1260( 1)(22)     1951( 4)(34)      3
256(2) 32    1M    2M    2G    3024( 1)(53)     2035( 0)(35)      3
792(2) 32    1M    2M    2G    3026( 0)(53)     1925( 0)(33)      1

MPIIO, file-per-process:
----------------------------------------------------------------------
tasks  stripe    xfer bytes/              rates (MB/s)          sample
(CPUs) ct  size  size task   write(%dev)(%opt) read(%dev)(%opt) count
----------------------------------------------------------------------
  2(2)  2    1M    2M    2G     147( 1)(20)      227(27)(32)      4
 32(2)  2    1M    2M    2G    1616( 2)(14)     1923( 2)(17)      4
256(2)  2    1M    2M    2G    2695( 6)(22)     1982( 1)(16)      4
798(2)  2    1M    2M    2G    2881( 0)(23)     2078( 0)(17)      1

MPIIO, single-shared-file:
----------------------------------------------------------------------
tasks  stripe    xfer bytes/              rates (MB/s)          sample
(CPUs) ct  size  size task   write(%dev)(%opt) read(%dev)(%opt) count
----------------------------------------------------------------------
  2(2) 32    1M    2M    2G      82( 1)( 1)      201(29)( 3)      4
 32(2) 32    1M    2M    2G    1331( 1)(23)     1990( 3)(35)      4
256(2) 32    1M    2M    2G    2980( 1)(52)     2006( 2)(35)      4
792(2) 32    1M    2M    2G    3053( 0)(53)     2016( 0)(35)      1


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to