On Fri, 2008-10-03 at 15:45 -0400, David Dillow wrote: > Has anyone run obdfilter-survey with large object/thread counts and can > share their experiences? I having some odd issues and am trying to > determine if it is a setup issue on my side, or something I should file > a bug on with Sun.
I filed bug 17382 for this, as well as a quick and dirty patch that fixes the issue. Currently 'lctl test_brw' exits as soon as any one of the threads exits. When you are only running 16GB through 512 threads with 1 MB request sizes, each thread will only do 32 requests, and so it is possible that one thread gets slightly preferential treatment and finish well before the rest of the pack. Options include increasing the amount of data written by each thread, which mitigates the issue but does not solve it. It also slows down the test, as testing seems to indicate a need for at least 256 requests/thread for more realistic numbers in my environment, or 131 GB per OST. More would be even better, but 256 means a potential runtime of over 30 minutes per variable change, making large surveys painful. My fix just waits for all threads to exit rather than stopping the test early when the first thread completes. It is still a good idea to raise the number of requests per thread to keep the workload up, but at these scales the throughput drops to almost pure random behavior against the noop scheduler, so we can limit the increase in test duration. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
