Alvaro Aguilera wrote: > thanks for the hint, but unfortunately I can't make any updates to the > cluster... > > Do you think both of the problems I experienced are bugs in Lustre and > are resolved in current versions? It should be lustre bugs. The 2 processes runs on different node or same node?
Thanks WangDi > > Thanks. > Alvaro. > > On Fri, Aug 21, 2009 at 6:32 AM, di wang <[email protected] > <mailto:[email protected]>> wrote: > > Hello, > > You may see bug 17197 and try to apply this patch > https://bugzilla.lustre.org/attachment.cgi?id=25062 to your > lustre src. Or you can wait 1.8.2. > > Thanks > Wangdi > > Alvaro Aguilera wrote: > > Hello, > > as a project for college I'm doing a behavioral comparison > between Lustre and CXFS when dealing with simple strided files > using POSIX semantics. On one of the tests, each participating > process reads 16 chunks of data with a size of 32MB each, from > a common, strided file using the following code: > > > ------------------------------------------------------------------------------------------ > int myfile = open("thefile", O_RDONLY); > > MPI_Barrier(MPI_COMM_WORLD); // the barriers are only to help > measuring time > > off_t distance = (numtasks-1)*p.buffersize; > off_t offset = rank*p.buffersize; > > int j; > lseek(myfile, offset, SEEK_SET); > for (j = 0; j < p.buffercount; j++) { > read(myfile, buffers[j], p.buffersize); // buffers are > aligned to the page size > lseek(myfile, distance, SEEK_CUR); > } > > MPI_Barrier(MPI_COMM_WORLD); > > close(myfile); > > ------------------------------------------------------------------------------------------ > > I'm facing the following problem: when this code is run in > parallel the read operations on certain processes start to > need more and more time to complete. I attached a graphical > trace of this, when using only 2 processes. > As you see, the read operations on process 0 stay more or less > constant, taking about 0.12 seconds to complete, while on > process 1 they increase up to 39 seconds! > > If I run the program with only one process, then the time > stays at ~0.12 seconds per read operation. The problem doesn't > appear if the O_DIRECT flag is used. > > Can somebody explain to me why is this happening? Since I'm > very new to Lustre, I may be making some silly mistakes, so be > nice to me ;) > > I'm using Lustre SLES 10 Patchlevel 1, Kernel > 2.6.16.54-0.2.5_lustre.1.6.5.1. > > > Thanks! > > Alvaro Aguilera. > > > > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > <mailto:[email protected]> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
