I've been trying out patchless kernels and the attached simple code
appears to trigger a failure in Lustre 1.6.0.1. I couldn't see anything
in bugzilla about it.

typically I see 4+ open() failures out of 32 on the first run after a
Lustre filesystem is mounted. often (but not always) the number of
failures decreases to a few or 0 on subsequent runs.

eg. typical output (where no output is success) would be:
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
 open of '/mnt/testfs/rjh/blk016.dat' failed on rank 16, hostname 'x15'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk018.dat' failed on rank 18, hostname 'x15'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk019.dat' failed on rank 19, hostname 'x15'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk022.dat' failed on rank 22, hostname 'x16'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk020.dat' failed on rank 20, hostname 'x16'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk023.dat' failed on rank 23, hostname 'x16'
 open: No such file or directory
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
 open of '/mnt/testfs/rjh/blk014.dat' failed on rank 14, hostname 'x14'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk013.dat' failed on rank 13, hostname 'x14'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk015.dat' failed on rank 15, hostname 'x14'
 open: No such file or directory
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh

which is 32 threads across 6 nodes attempting to open and close 1 file each
(32 files total) in 1 directory over o2ib.

if I umount and remount the filesystem then the higher rate of errors
occurs again.
  cexec :11-16 umount /mnt/testfs
  cexec :11-16 /usr/sbin/lustre_rmmod ; cexec :11-16 /usr/sbin/lustre_rmmod
  cexec :11-16 mount -t lustre [EMAIL PROTECTED]:/testfs /mnt/testfs

Note that the same failures happen over GigE too, but only on larger
tests. eg. -np 64 or 128. so the extra speed of IB is triggering the
bugs sooner.

if Lustre kernel rpms (eg. 2.6.9-42.0.10.EL_lustre-1.6.0.1smp) are used
instead of the patchless kernels, then I don't see any failures. tested
out to -np 512.

patchless 2.6.19.7 and 2.6.21.5 give failures at about the same rate.
modules for 2.6.19.7 were built using the standard Lustre 1.6.0.1
tarball, and 2.6.21.5 modules were built using
  
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/1.6/lustre-1.6.0.1-ql3.tar.bz2
as that's a lot easier to work with than
  https://bugzilla.lustre.org/show_bug.cgi?id=11647

Lustre setup is:
  1 OSS node with 2 OSTs, each md raid0 SAS, 2.6.9-42.0.10.EL_lustre-1.6.0.1smp
  1 MDS node with MDT on a 3G ramdisk                ""
  6 client nodes
  no lustre striping, lnet debugging as the default
  all nodes are dual dual-core Xeon x86_64 CentOS4.5
  nodes are booting diskless oneSIS

another data point is that if I rm all the files in the dir then the
test succeeds more often (up until the time the fs is umount'd and
remounted). so something about the unlink/create combo might be the
problem. eg.
  % cexec :11 rm /mnt/testfs/rjh/'*'
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
# the above both succeed. umount and remount the fs as per above, then:
  % /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
 open of '/mnt/testfs/rjh/blk014.dat' failed on rank 14, hostname 'x14'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk015.dat' failed on rank 15, hostname 'x14'
 open: No such file or directory
 open of '/mnt/testfs/rjh/blk010.dat' failed on rank 10, hostname 'x13'
 open: No such file or directory
 ...

please let me know if you'd like me to re-run anything with a different
setup or try different kernels or something...

cheers,
robin
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <mpi.h>

int main(int nargs, char** argv)
{
         int myRank;
         char fname[128];
         int fp;
         int mpiErr, closeErr;
         char name[64];

         mpiErr = MPI_Init(&nargs, &argv);
         if ( mpiErr ) perror( "MPI_Init" );
         mpiErr = MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
         if ( mpiErr ) perror( "MPI_Comm_rank" );
         gethostname(name, sizeof(name));

         sprintf(fname,"%s/blk%03d.dat", argv[1], myRank);

         fp = open(fname, (O_RDWR | O_CREAT | O_TRUNC), 0640 );
         if ( fp == -1 ) {
                 fprintf(stderr,"open of '%s' failed on rank %d, hostname 
'%s'\n", fname, myRank, name);
                 perror("open");
         }
         else {
                 closeErr = close(fp);
                 if ( closeErr ) {
                         fprintf(stderr,"close of '%s' failed on rank %d\n", 
fname, myRank);
                         perror("close");
                 }
         }

         mpiErr = MPI_Finalize();
         if ( mpiErr ) perror( "MPI_Finalize" );

         exit(0);
}
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to