hi, here is the requested information:
before test: llite.fastfs-ffff810102a6a400.read_ahead_stats= snapshot_time: 1251851453.382275 (secs.usecs) pending issued pages: 0 hits 7301235 misses 10546 readpage not consecutive 14369 miss inside window 1 failed grab_cache_page 6285314 failed lock match 0 read but discarded 98955 zero length file 0 zero size window 3495 read-ahead to EOF 172 hit max r-a issue 783042 wrong page from grab_cache_page 0 after: llite.fastfs-ffff810102a6a400.read_ahead_stats= snapshot_time: 1251851620.183964 (secs.usecs) pending issued pages: 0 hits 7506005 misses 330064 readpage not consecutive 14432 miss inside window 319450 failed grab_cache_page 6322954 failed lock match 17294 read but discarded 98955 zero length file 0 zero size window 3495 read-ahead to EOF 192 hit max r-a issue 837908 wrong page from grab_cache_page 0 there seems to by a lot of misses, as well as a locking problem, doesn't it? Btw. in the test, 4 processes read 512mb each from a 2gb big file. Regards, Alvaro. On Fri, Aug 21, 2009 at 3:38 PM, di wang <[email protected]> wrote: > hello, > Alvaro Aguilera wrote: > >> they run on different physical nodes and access the ost via 4x infiniband. >> >> I never heard such problems, if they on different nodes. Client memory? > Can you post read-ahead stats (before and after the test) here by > > lctl get_param llite.*.read_ahead_stats > > > But there are indeed a lot fixes about stride read since 1.6.5, which is > included in the tar ball I posted below. > And it probably can fix your problem. > > Thanks > WangDi > > On Fri, Aug 21, 2009 at 3:15 PM, di wang <[email protected] <mailto: >> [email protected]>> wrote: >> >> Alvaro Aguilera wrote: >> >> thanks for the hint, but unfortunately I can't make any >> updates to the cluster... >> >> Do you think both of the problems I experienced are bugs in >> Lustre and are resolved in current versions? >> >> It should be lustre bugs. The 2 processes runs on different node >> or same node? >> >> Thanks >> WangDi >> >> >> Thanks. >> Alvaro. >> >> >> On Fri, Aug 21, 2009 at 6:32 AM, di wang <[email protected] >> <mailto:[email protected]> <mailto:[email protected] >> >> <mailto:[email protected]>>> wrote: >> >> Hello, >> >> You may see bug 17197 and try to apply this patch >> https://bugzilla.lustre.org/attachment.cgi?id=25062 to your >> lustre src. Or you can wait 1.8.2. >> >> Thanks >> Wangdi >> >> Alvaro Aguilera wrote: >> >> Hello, >> >> as a project for college I'm doing a behavioral comparison >> between Lustre and CXFS when dealing with simple >> strided files >> using POSIX semantics. On one of the tests, each >> participating >> process reads 16 chunks of data with a size of 32MB >> each, from >> a common, strided file using the following code: >> >> >> ------------------------------------------------------------------------------------------ >> int myfile = open("thefile", O_RDONLY); >> >> MPI_Barrier(MPI_COMM_WORLD); // the barriers are only >> to help >> measuring time >> >> off_t distance = (numtasks-1)*p.buffersize; >> off_t offset = rank*p.buffersize; >> >> int j; >> lseek(myfile, offset, SEEK_SET); >> for (j = 0; j < p.buffercount; j++) { >> read(myfile, buffers[j], p.buffersize); // >> buffers are >> aligned to the page size >> lseek(myfile, distance, SEEK_CUR); >> } >> >> MPI_Barrier(MPI_COMM_WORLD); >> >> close(myfile); >> >> ------------------------------------------------------------------------------------------ >> >> I'm facing the following problem: when this code is run in >> parallel the read operations on certain processes start to >> need more and more time to complete. I attached a graphical >> trace of this, when using only 2 processes. >> As you see, the read operations on process 0 stay more >> or less >> constant, taking about 0.12 seconds to complete, while on >> process 1 they increase up to 39 seconds! >> >> If I run the program with only one process, then the time >> stays at ~0.12 seconds per read operation. The problem >> doesn't >> appear if the O_DIRECT flag is used. >> >> Can somebody explain to me why is this happening? Since I'm >> very new to Lustre, I may be making some silly >> mistakes, so be >> nice to me ;) >> >> I'm using Lustre SLES 10 Patchlevel 1, Kernel >> 2.6.16.54-0.2.5_lustre.1.6.5.1. >> >> >> Thanks! >> >> Alvaro Aguilera. >> >> >> >> ------------------------------------------------------------------------ >> >> >> ------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> >> >> >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
