O. Hartmann
Mon, 12 Oct 2009 00:45:44 -0700
Steve Kargl wrote:
On Mon, Oct 12, 2009 at 03:35:15PM +1100, Alex R wrote:Steve Kargl wrote:Ah ok. Is this just an accepted thing by the freebsd dev's or are they trying to fix it?On Mon, Oct 12, 2009 at 01:49:27PM +1100, Alex R wrote:Steve Kargl wrote:I thought SCHED_ULE was meant to be a much better choice under an SMP environment. Why are you suggesting he rebuild his kernel and use the legacy scheduler?So, you have 4 cpus and 4 folding-at-home processes and you're trying to use the system with other apps? Switch to 4BSD.If you have N cpus and N+1 numerical intensitive applications, ULE may have poor performance compared to 4BSD. In OP's case, he has 4 cpus and 4 numerical intensity (?) applications. He, however, also is trying to use the system in some interactive way.Jeff appears to be extremely busy with other projects. He is aware of the problem, and I have set up my system to give him access when/if it is so desired. Here's the text of my last set of tests that I sent to him OK, I've manage to recreate the problem. User kargl launches a mpi job on node10 that creates two images on node20. This is command z in the top(1) info. 30 seconds later, user sgk lauches a mpi process on node10 that creates 8 images on node20. This is command rivmp in top(1) info. With 8 available cpus, this is a (slightly) oversubscribed node. For 4BSD, I see last pid: 1432; load averages: 8.68, 5.65, 2.82 up 0+01:52:14 17:07:22 40 processes: 11 running, 29 sleeping CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle Mem: 32M Active, 12M Inact, 203M Wired, 424K Cache, 29M Buf, 31G Free Swap: 4096M Total, 4096M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 1428 sgk 1 124 0 81788K 5848K CPU3 6 1:13 78.81% rivmp 1431 sgk 1 124 0 81788K 5652K RUN 1 1:13 78.52% rivmp 1415 kargl 1 124 0 78780K 4668K CPU7 1 1:38 78.42% z 1414 kargl 1 124 0 78780K 4664K CPU0 0 1:37 77.25% z 1427 sgk 1 124 0 81788K 5852K CPU4 3 1:13 78.42% rivmp 1432 sgk 1 124 0 81788K 5652K CPU2 4 1:13 78.27% rivmp 1425 sgk 1 124 0 81788K 6004K CPU5 5 1:12 78.17% rivmp 1426 sgk 1 124 0 81788K 5832K RUN 6 1:13 78.03% rivmp 1429 sgk 1 124 0 81788K 5788K CPU6 7 1:12 77.98% rivmp 1430 sgk 1 124 0 81788K 5764K RUN 2 1:13 77.93% rivmp Notice, the accumulated times appear reasonable. At this point in the computations, rivmp is doing no communication between processes. z is the netpipe benchmark and is essentially sending messages between the two processes over the memory bus. For ULE, I see last pid: 1169; load averages: 7.56, 2.61, 1.02 up 0+00:03:15 17:13:01 40 processes: 11 running, 29 sleeping CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle Mem: 31M Active, 9392K Inact, 197M Wired, 248K Cache, 26M Buf, 31G Free Swap: 4096M Total, 4096M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND 1168 sgk 1 118 0 81788K 5472K CPU6 6 1:18 100.00% rivmp 1169 sgk 1 118 0 81788K 5416K CPU7 7 1:18 100.00% rivmp 1167 sgk 1 118 0 81788K 5496K CPU5 5 1:18 100.00% rivmp 1166 sgk 1 118 0 81788K 5564K RUN 4 1:18 100.00% rivmp 1151 kargl 1 118 0 78780K 4464K CPU3 3 1:48 99.27% z 1152 kargl 1 110 0 78780K 4464K CPU0 0 1:18 62.89% z 1164 sgk 1 113 0 81788K 5592K CPU1 1 0:55 80.76% rivmp 1165 sgk 1 110 0 81788K 5544K RUN 0 0:52 62.16% rivmp 1163 sgk 1 107 0 81788K 5624K RUN 2 0:40 50.68% rivmp 1162 sgk 1 107 0 81788K 5824K CPU2 2 0:39 50.49% rivmp In the above, processes 1162-1165 are clearly not receiving sufficient time slices to keep up with the other 4 rivmp images. From watching top at a 1 second interval, once the 4 rivmp hit 100% CPU, they stayed pinned to their cpu and stay at 100% CPU. It is also seen that processes 1152, 1165 and 1162, 1163 are stuck on cpus 0 and 2, respectively.
This isn't only bound to floating-point intense applications, even the operating system itselfs seems to suffer from SCHED_ULE. I saw, see and reported several performance issue under heavy load and for seconds (if not minutes!) 4+ CPU boxes get as stuck as a UP box does. Those sticky sitiuations are painful in cases where the box needs to be accessed via X11. The remaining four FreeBSD 8.0-boxes used for numerical applications in our lab (others switched to Linux a long time ago) all uses SCHED_ULE, as this scheduler was introduced to be the superior scheduler over the legacy 4BSD. Well, I'll give 4BSD a chance again.
At the moment, even our 8-core DELL Poweredge box is in production use, but if there is something I can do, menas: benchmarking, I'll give it a try.
Regards, Oliver _______________________________________________ freebsd-performance@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-performance To unsubscribe, send any mail to "freebsd-performance-unsubscr...@freebsd.org"