> This is a very interesting example, and I wish we had known about > collectl a year ago before we invested time in writing data gathering > scripts which aren't as useful as what you have here. > I had mentioned collectl/lustre in a couple of places before but I guess I wasn't loud enough. 8-) The important thing is I got your attention. > One question - is this "over readahead" still a problem? I know there > was an but like this (anything over 8kB was considered to be sequential > and invoked readahead because it generated 3 consecutive pages of IO), > but I thought it had been fixed some time ago. There is a sanity.sh > test_101 that exercises random reads oand checks that there are no > discarded pages. > actually, as we speak I'm getting ready to release a new version of collectl (stay tuned). while I don't have any specific readahead needs, I believe there is still something not right. I also think the operations manual is misleading because it says readahead is triggered after the second sequential read and I'd think one could interpret that to mean when you do your second read you invoke readahead, but it's really not until your third read. Furthermore, 'read' sounds like a read call when in fact it really means - as you stated above - it's the 3rd page, not call. And finally when you say this has been fixed, what exactly does that mean? does readahead work differently now?
Anyhow getting back to some of my experiments, and these are on 1.6.4.3. First of all, I discovered my perl script that was doing the random reads was using the perl 'read' function rather than 'sysread' so there's some stuff extra happening there behind the scenes I'm not really sure about. However, it's causing a lot of readahead (or at least excess network traffic) and that puzzles me. Here's an example of doing 8K reads using perl's read function: [EMAIL PROTECTED] disktests]# collectl -snl -OR -oT # <----------Network----------><-------------Lustre Client--------------> #Time netKBi pkt-in netKBo pkt-out Reads KBRead Writes KBWrite Hits Misses 16:33:41 141 148 26 138 69 276 0 0 0 61 16:33:42 296 307 52 261 70 280 0 0 2 64 16:33:43 311 323 54 275 78 312 0 0 0 64 16:33:44 310 321 54 276 73 292 0 0 0 63 16:33:45 306 316 53 266 63 252 0 0 0 61 16:33:46 301 311 53 267 76 304 0 0 0 68 and you can clearly see the traffic on the network matches what lustre is delivering to the client. I also saw in the rpc stats that all the requests were for single pages when they should have been for 2. But now look what happens when I go to 9K # <----------Network----------><-------------Lustre Client--------------> #Time netKBi pkt-in netKBo pkt-out Reads KBRead Writes KBWrite Hits Misses 16:34:42 13017 8887 349 4597 39 156 0 0 0 48 16:34:43 15310 10443 418 5544 65 260 0 0 0 69 16:34:44 18801 12839 501 6601 58 232 0 0 0 62 16:34:45 19436 13263 522 6926 24 96 0 0 0 32 This is clearly generating a lot of network traffic compared to the client's data rate. Perhaps someone who is more familiar with the subtleties of the perl 'read' function will know. Anyhow when I changed my 'read' to 'sysread' things seem to get better so perhaps readahead indeed works differently now? If so does that mean the current definition is wrong? If so, what should it be? In any event, playing around a little I kind of stumbled on this one. I ran my perl script to do a single sysread, sleep a second and then do another. While I couldn't see it doing any unexpected network traffic for 12K requests, look what happens for 50K ones: # <----------Network----------><-------------Lustre Client--------------> #Time netKBi pkt-in netKBo pkt-out Reads KBRead Writes KBWrite Hits Misses 16:41:32 55 41 2 31 1 50 0 0 12 1 16:41:33 56 46 4 38 1 50 0 0 12 1 16:41:34 55 41 2 31 1 50 0 0 12 1 16:41:35 55 40 2 31 1 50 0 0 12 1 16:41:36 1122 766 30 408 1 50 0 0 12 1 16:41:37 55 41 2 31 1 50 0 0 12 1 16:41:38 55 40 2 31 1 50 0 0 0 1 16:41:39 1130 774 30 412 0 0 0 0 12 0 If not readahead, lustre is certainly doing something funky over the wire... And finally, if I remove the sleep and just do a bunch of 50K reads here's what I see: # <----------Network----------><-------------Lustre Client--------------> #Time netKBi pkt-in netKBo pkt-out Reads KBRead Writes KBWrite Hits Misses 16:45:35 2952 2061 98 1121 49 2450 0 0 564 47 16:45:36 4744 3296 149 1745 40 2000 0 0 468 39 16:45:37 5158 3562 153 1884 46 2300 0 0 541 43 16:45:38 5816 4027 177 2129 47 2350 0 0 552 46 16:45:39 3601 2520 120 1356 52 2600 0 0 610 50 16:45:40 4897 3405 155 1808 51 2550 0 0 564 47 16:45:41 5862 4061 178 2134 49 2450 0 0 588 49 16:45:42 4799 3336 151 1763 52 2600 0 0 588 49 16:45:43 5864 4067 179 2139 52 2600 0 0 573 48 16:45:44 4836 3362 153 1799 38 1900 0 0 444 37 16:45:45 4199 2913 130 1550 55 2750 0 0 587 47 16:45:46 6938 4789 204 2498 53 2650 0 0 600 50 16:45:47 4854 3373 153 1789 46 2300 0 0 494 38 on the average it looks like 2-3 times more data is being sent over the network than the client is delivering. Any thoughts of what's going on in these cases? In any event feel free to download collectl and check things out for yourself. I'll notify this list when that happens. sorry for the long reply... -mark > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
