thanks for the hint, but unfortunately I can't make any updates to the cluster...
Do you think both of the problems I experienced are bugs in Lustre and are resolved in current versions? Thanks. Alvaro. On Fri, Aug 21, 2009 at 6:32 AM, di wang <[email protected]> wrote: > Hello, > > You may see bug 17197 and try to apply this patch > https://bugzilla.lustre.org/attachment.cgi?id=25062 to your lustre src. > Or you can wait 1.8.2. > > Thanks > Wangdi > > Alvaro Aguilera wrote: > >> Hello, >> >> as a project for college I'm doing a behavioral comparison between Lustre >> and CXFS when dealing with simple strided files using POSIX semantics. On >> one of the tests, each participating process reads 16 chunks of data with a >> size of 32MB each, from a common, strided file using the following code: >> >> >> ------------------------------------------------------------------------------------------ >> int myfile = open("thefile", O_RDONLY); >> >> MPI_Barrier(MPI_COMM_WORLD); // the barriers are only to help measuring >> time >> >> off_t distance = (numtasks-1)*p.buffersize; >> off_t offset = rank*p.buffersize; >> >> int j; >> lseek(myfile, offset, SEEK_SET); >> for (j = 0; j < p.buffercount; j++) { >> read(myfile, buffers[j], p.buffersize); // buffers are aligned to >> the page size >> lseek(myfile, distance, SEEK_CUR); >> } >> >> MPI_Barrier(MPI_COMM_WORLD); >> >> close(myfile); >> >> ------------------------------------------------------------------------------------------ >> >> I'm facing the following problem: when this code is run in parallel the >> read operations on certain processes start to need more and more time to >> complete. I attached a graphical trace of this, when using only 2 processes. >> As you see, the read operations on process 0 stay more or less constant, >> taking about 0.12 seconds to complete, while on process 1 they increase up >> to 39 seconds! >> >> If I run the program with only one process, then the time stays at ~0.12 >> seconds per read operation. The problem doesn't appear if the O_DIRECT flag >> is used. >> >> Can somebody explain to me why is this happening? Since I'm very new to >> Lustre, I may be making some silly mistakes, so be nice to me ;) >> >> I'm using Lustre SLES 10 Patchlevel 1, Kernel >> 2.6.16.54-0.2.5_lustre.1.6.5.1. >> >> >> Thanks! >> >> Alvaro Aguilera. >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
