I've been trying out patchless kernels and the attached simple code
appears to trigger a failure in Lustre 1.6.0.1. I couldn't see anything
in bugzilla about it.
typically I see 4+ open() failures out of 32 on the first run after a
Lustre filesystem is mounted. often (but not always) the number of
failures decreases to a few or 0 on subsequent runs.
eg. typical output (where no output is success) would be:
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
open of '/mnt/testfs/rjh/blk016.dat' failed on rank 16, hostname 'x15'
open: No such file or directory
open of '/mnt/testfs/rjh/blk018.dat' failed on rank 18, hostname 'x15'
open: No such file or directory
open of '/mnt/testfs/rjh/blk019.dat' failed on rank 19, hostname 'x15'
open: No such file or directory
open of '/mnt/testfs/rjh/blk022.dat' failed on rank 22, hostname 'x16'
open: No such file or directory
open of '/mnt/testfs/rjh/blk020.dat' failed on rank 20, hostname 'x16'
open: No such file or directory
open of '/mnt/testfs/rjh/blk023.dat' failed on rank 23, hostname 'x16'
open: No such file or directory
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
open of '/mnt/testfs/rjh/blk014.dat' failed on rank 14, hostname 'x14'
open: No such file or directory
open of '/mnt/testfs/rjh/blk013.dat' failed on rank 13, hostname 'x14'
open: No such file or directory
open of '/mnt/testfs/rjh/blk015.dat' failed on rank 15, hostname 'x14'
open: No such file or directory
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
which is 32 threads across 6 nodes attempting to open and close 1 file each
(32 files total) in 1 directory over o2ib.
if I umount and remount the filesystem then the higher rate of errors
occurs again.
cexec :11-16 umount /mnt/testfs
cexec :11-16 /usr/sbin/lustre_rmmod ; cexec :11-16 /usr/sbin/lustre_rmmod
cexec :11-16 mount -t lustre [EMAIL PROTECTED]:/testfs /mnt/testfs
Note that the same failures happen over GigE too, but only on larger
tests. eg. -np 64 or 128. so the extra speed of IB is triggering the
bugs sooner.
if Lustre kernel rpms (eg. 2.6.9-42.0.10.EL_lustre-1.6.0.1smp) are used
instead of the patchless kernels, then I don't see any failures. tested
out to -np 512.
patchless 2.6.19.7 and 2.6.21.5 give failures at about the same rate.
modules for 2.6.19.7 were built using the standard Lustre 1.6.0.1
tarball, and 2.6.21.5 modules were built using
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/1.6/lustre-1.6.0.1-ql3.tar.bz2
as that's a lot easier to work with than
https://bugzilla.lustre.org/show_bug.cgi?id=11647
Lustre setup is:
1 OSS node with 2 OSTs, each md raid0 SAS, 2.6.9-42.0.10.EL_lustre-1.6.0.1smp
1 MDS node with MDT on a 3G ramdisk ""
6 client nodes
no lustre striping, lnet debugging as the default
all nodes are dual dual-core Xeon x86_64 CentOS4.5
nodes are booting diskless oneSIS
another data point is that if I rm all the files in the dir then the
test succeeds more often (up until the time the fs is umount'd and
remounted). so something about the unlink/create combo might be the
problem. eg.
% cexec :11 rm /mnt/testfs/rjh/'*'
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
# the above both succeed. umount and remount the fs as per above, then:
% /opt/openmpi/1.2/bin/mpirun --hostfile hosts -np 32 ./open /mnt/testfs/rjh
open of '/mnt/testfs/rjh/blk014.dat' failed on rank 14, hostname 'x14'
open: No such file or directory
open of '/mnt/testfs/rjh/blk015.dat' failed on rank 15, hostname 'x14'
open: No such file or directory
open of '/mnt/testfs/rjh/blk010.dat' failed on rank 10, hostname 'x13'
open: No such file or directory
...
please let me know if you'd like me to re-run anything with a different
setup or try different kernels or something...
cheers,
robin
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <mpi.h>
int main(int nargs, char** argv)
{
int myRank;
char fname[128];
int fp;
int mpiErr, closeErr;
char name[64];
mpiErr = MPI_Init(&nargs, &argv);
if ( mpiErr ) perror( "MPI_Init" );
mpiErr = MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
if ( mpiErr ) perror( "MPI_Comm_rank" );
gethostname(name, sizeof(name));
sprintf(fname,"%s/blk%03d.dat", argv[1], myRank);
fp = open(fname, (O_RDWR | O_CREAT | O_TRUNC), 0640 );
if ( fp == -1 ) {
fprintf(stderr,"open of '%s' failed on rank %d, hostname
'%s'\n", fname, myRank, name);
perror("open");
}
else {
closeErr = close(fp);
if ( closeErr ) {
fprintf(stderr,"close of '%s' failed on rank %d\n",
fname, myRank);
perror("close");
}
}
mpiErr = MPI_Finalize();
if ( mpiErr ) perror( "MPI_Finalize" );
exit(0);
}
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss