Mersenne Digest Monday, April 2 2001 Volume 01 : Number 835 ---------------------------------------------------------------------- Date: Fri, 30 Mar 2001 15:53:58 EST From: [EMAIL PROTECTED] Subject: Mersenne: problems with Solaris nice I run Mlucas on several Sparcs under Solaris at work, and I've noticed something not so nice about the Solaris nice command. The lowest priority Solaris allows is 19, which on any other Unix-like OS I've used would mean the job in question only gets CPU cycles if no high-priority processes are running. However, under Solaris, something run at priority 19 still tends to get a not-insignificant share of CPU time - a typical number is 15% on a system with one other full-priority job running. This has caused me to have to restrict my GIMPS work to only our least-used (read: slowest) Sparcs, since people running actual work-related jobs don't want to be slowed down by a recreational-math program. So here's my question: does anyone know of a solution to the above conundrum? The first thing that came to my mind was to run the code as a cron job, but's that's not optimal since folks here tend to run jobs at any and all hours, often overnight. What I'm thinking of is inspired by the output of the top utility - this lists (among other things) the average job load of a system, and also what % of the CPU time each active process is getting. (Since top is available for Sparc Linux, that means source code is available.) Would it be possible to write a script to basically emulate the OS's process scheduler, but which is used to control just one application? What I'd like this personal scheduler to do is the following: 1) Every few seconds get the average system load, in terms of how many CPUs would be required to run every active process at full speed; 2) If the load exceeds the available number of CPUs, suspend the user program, but keep everything in main memory; otherwise, keep running. Any such scheme, in order to be practical, would have to be executable without superuser privileges. Thanks, - -Ernst _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sat, 31 Mar 2001 08:09:52 -0000 From: "Brian J. Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: problems with Solaris nice On 30 Mar 2001, at 15:53, [EMAIL PROTECTED] wrote: > I run Mlucas on several Sparcs under Solaris at work, > and I've noticed something not so nice about the Solaris > nice command. I never did find out why the (unix) command was called "nice". My theory is that a neolithic unix hacker wanted to show off the latest tool he'd written, but couldn't think of a name for. On being shown the action of this tool, his colleague murmured "nice"! Should the Solaris "nice" command should be renamed "ugly"? > The lowest priority Solaris allows is 19, I believe this is standard unix. It may be that linux et al are diverging here. > which on any other Unix-like OS I've used would mean the > job in question only gets CPU cycles if no high-priority > processes are running. Well, that's what the documentation leads you to expect ... For reasons like making sure that interactive processes are controllable e.g. by mouse commands, there is always _some_ leakage of CPU cycles, but the proportion needn't be anything other than very small. > processes are running. However, under Solaris, something > run at priority 19 still tends to get a > not-insignificant share of CPU time - a typical number > is 15% on a system with one other full-priority job > running. I sort of didn't really believe this until I checked it. But you're right! > This has caused me to have to restrict my GIMPS > work to only our least-used (read: slowest) Sparcs, > since people running actual work-related jobs don't > want to be slowed down by a recreational-math program. Sensible attitude ... doesn't bother me much on the two Solaris systems I have access to, since the other jobs running tend to be I/O bound rather than compute bound. > > So here's my question: does anyone know of a solution > to the above conundrum? The first thing that came to my > mind was to run the code as a cron job, but's that's not > optimal since folks here tend to run jobs at any and all > hours, often overnight. Check out the Solaris priocntl command. I think you will find that the default range of priority adjustment is 60 each way. With priocntl you should be able to fix the Mlucas process so that it adjusts much less - 10 each way should be enough. This will prevent the scheduling priority rising enough so that anyone who has another compute-bound process is losing a signficant number of CPU cycles. You may also want to use ls -ful <username> to see the way in which the scheduling priority, the nice value and the processor share are correlated. (You might want to try this on a near-empty system with another CPU soak program running un-niced in the foreground to simulate the loading imposed by real users.) I haven't actually tried this & may be wrong, but even if I have the details wrong there's probably some way of using priocntl to throttle a process so that it doesn't steal a significant number of CPU cycles on a compute-bound system. > > What I'm thinking of is inspired by the output of the > top utility - this lists (among other things) the > average job load of a system, and also what % of the > CPU time each active process is getting. (Since top is > available for Sparc Linux, that means source code is > available.) Would it be possible to write a script to > basically emulate the OS's process scheduler, but which > is used to control just one application? What I'd like > this personal scheduler to do is the following: I think it would be possible, but it may be unneccessary. > > 1) Every few seconds get the average system load, in > terms of how many CPUs would be required to run every > active process at full speed; > > 2) If the load exceeds the available number of CPUs, > suspend the user program, but keep everything in main > memory; otherwise, keep running. Oops, this is _silly_. If you run a compute-bound process on an otherwise totally unloaded system, the load average will be 1.0. So your algorithm will thrash the process up & down. I think the load average limit for starting the process should be about one half less than the number of processors - say 0.5 on a uniprocessor system - and the load average limit for stopping it should be a little less than one more than the number of processors - say 1.8 on a uniprocessor system. The induced hysteresis will prevent the controlled process thrashing up & down yet still force it out of the way when something else wants to use the CPU capacity in a serious way. > > Any such scheme, in order to be practical, would have > to be executable without superuser privileges. Shouldn't be a problem. Start the script by looking for the Mlucas process id. Then, in a loop: sleep for n seconds; check if the process still exists (exit if not); check the load average; restart or suspend as neccessary; repeat until hell freezes over. Run the script as a detatched process using nohup &; it can safely run at normal nice/priority since its resource usage will be minimal. Ordinary users are (normally) allowed to control their own processes (within reason), so I don't see that root privelege should be required. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sat, 31 Mar 2001 22:34:32 -0500 From: Nathan Russell <[EMAIL PROTECTED]> Subject: Re: Mersenne: problems with Solaris nice On Saturday 31 March 2001 03:09, Brian J. Beesley wrote: > On 30 Mar 2001, at 15:53, [EMAIL PROTECTED] wrote: > > processes are running. However, under Solaris, something > > run at priority 19 still tends to get a > > not-insignificant share of CPU time - a typical number > > is 15% on a system with one other full-priority job > > running. > > I sort of didn't really believe this until I checked it. But you're > right! Under Linux, it is only slightly better. (note, however, that the 'other' job is not a particularly kind one in CPU terms!) PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 5528 nathan 14 0 464 464 376 R 85.0 0.3 2:47 yes 200 nathan 19 19 11720 11M 336 R N 5.3 8.9 1436m mprime 4126 root 0 0 32860 32M 2180 S 4.9 25.1 4:11 X 5467 nathan 0 0 14452 14M 10572 S 3.0 11.0 0:08 kmail 4262 nathan 0 0 9552 9552 7888 S 0.4 7.3 0:17 kdeinit 4488 nathan 0 0 4524 4524 3440 S 0.3 3.4 0:28 gaim 4277 nathan 0 0 7140 7140 6360 S 0.2 5.4 0:13 kdeinit 5514 nathan 1 0 920 920 696 R 0.2 0.7 0:00 top > > This has caused me to have to restrict my GIMPS > > work to only our least-used (read: slowest) Sparcs, > > since people running actual work-related jobs don't > > want to be slowed down by a recreational-math program. > > Sensible attitude ... doesn't bother me much on the two Solaris > systems I have access to, since the other jobs running tend to be I/O > bound rather than compute bound. I wish the admins here would allow distributed computing - they say that whenever the machine is slow they were getting flames about the distributed client 'raising the load average'. (big snip) > I think the load average limit for starting the process should be > about one half less than the number of processors - say 0.5 on a > uniprocessor system - and the load average limit for stopping it > should be a little less than one more than the number of processors - > say 1.8 on a uniprocessor system. I don't know much about such things, but I would note that I was working quite comfortably during the above test with load averages in the 2.3 range. However, the typical 'uniprocessor system' is not a P3 with a single user typing email! > Regards > Brian Beesley Nathan _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sun, 1 Apr 2001 03:02:00 -0700 (PDT) From: Francois Gouget <[EMAIL PROTECTED]> Subject: Re: Mersenne: problems with Solaris nice On Fri, 30 Mar 2001 [EMAIL PROTECTED] wrote: [...] > 1) Every few seconds get the average system load, in > terms of how many CPUs would be required to run every > active process at full speed; > > 2) If the load exceeds the available number of CPUs, > suspend the user program, but keep everything in main > memory; otherwise, keep running. > > Any such scheme, in order to be practical, would have > to be executable without superuser privileges. Try loadwatch. There's a package for it on Debian called... loadwatch. I don't know of a similar package for Solaris but you can get a loadwatch clone that I wrote based on source code by G. Garonni. It's called loadwatcher and you can get the source code on my home page. I tested it on Solaris so it should pretty much compile an run just fine. http://fgouget.free.fr/distributed/index-en.shtml#h3 - -- Francois Gouget [EMAIL PROTECTED] http://fgouget.free.fr/ Cahn's Axiom: When all else fails, read the instructions. _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Sun, 1 Apr 2001 10:22:31 -0000 From: "Brian J. Beesley" <[EMAIL PROTECTED]> Subject: Re: Mersenne: problems with Solaris nice On 31 Mar 2001, at 22:34, Nathan Russell wrote: > Under Linux, it is only slightly better. (note, however, that the > 'other' job is not a particularly kind one in CPU terms!) Just checked. Kernel 2.4 behaves just like Solaris. Kernel 2.2 doesn't, a "nice -n20"'d CPU-bound process gets hardly any CPU time whilst there are CPU-bound processes running at normal priority. Telling people not to upgrade to kernel 2.4 simply isn't on. We'll have to find a solution to this. The ideal solution would be to incorporate mprime as the null process, but for various reasons this isn't practical. The upside of kernel 2.4 is that mprime seems to run a little faster on an otherwise idle system - around 1% or 2% - of course, anything helps! > > I wish the admins here would allow distributed computing - they say > that whenever the machine is slow they were getting flames about the > distributed client 'raising the load average'. The flames are fed by three things: ignorance, stupidity and nothing else. Of course the load average rises! The point is that you should be able to run a huge number of low-priority CPU soak programs without affecting the apparent response of the system to interactive users. Anyone else wanting to run a background CPU soak program is naturally going to be affected. > > (big snip) > > > I think the load average limit for starting the process should be > > about one half less than the number of processors - say 0.5 on a > > uniprocessor system - and the load average limit for stopping it > > should be a little less than one more than the number of processors > > - say 1.8 on a uniprocessor system. > > I don't know much about such things, but I would note that I was > working quite comfortably during the above test with load averages in > the 2.3 range. However, the typical 'uniprocessor system' is not a P3 > with a single user typing email! The load average is simply the average over the last interval of the number of computable process threads on the system - irrespective of priority. If you do "cat /proc/loadavg" (on a linux system; Solaris seems not to have this capability) the first three numbers are the load average over the last one minute, five minutes and one hour respectively. Processes instantaneously at a lower priority (higher priority number) are irrelevant. Therefore the load average on its own is a poor indicator of the system's capability to do work, especially when the actual work is interactive or I/O bound rather than compute bound. If the one minute load average is less than the number of processors, you have wasted some cycles during the last minute; you could consider starting an extra CPU soak process. If you have one too many CPU soak processes running, the load average will be around one more than the number of processors, even if there is nothing else active. Possibly a bit less than N+1, since worthwhile CPU soak processes are rarely perfect - they do tend to do at least some I/O. The paragraph above shows reasoning behind my suggestion; the values I give are on the safe side. Note that it's important that the difference between the "start" and "stop" thresholds is greater than 1, otherwise thrashing is likely. Regards Brian Beesley _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ Date: Mon, 02 Apr 2001 13:27:47 EDT From: [EMAIL PROTECTED] Subject: Mersenne: reconfigurable MP processor This is interesting: http://www.cnn.com/2001/TECH/ptech/03/30/langley.supercomputer/index.html I wonder what the power dissipation for such a beast is? Sure, each FPU one creates within the programmable logic may run at only ~100-200 MHz (typical numbers for high-end programmable logic), but if one has a thousand of them, that's a *lot* of watts, especially since a typical programmable gate draws an order of magnitude more power than fixed logic. Still, if they've got a working prototype, they must have handled that problem. - -Ernst _________________________________________________________________________ Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm Mersenne Prime FAQ -- http://www.tasam.com/~lrwiman/FAQ-mers ------------------------------ End of Mersenne Digest V1 #835 ******************************
