Mersenne Digest V1 #835

Mersenne Digest Mon, 02 Apr 2001 10:43:41 -0700

Mersenne Digest         Monday, April 2 2001         Volume 01 : Number 835




----------------------------------------------------------------------

Date: Fri, 30 Mar 2001 15:53:58 EST
From: [EMAIL PROTECTED]
Subject: Mersenne: problems with Solaris nice

I run Mlucas on several Sparcs under Solaris at work,
and I've noticed something not so nice about the Solaris
nice command. The lowest priority Solaris allows is 19,
which on any other Unix-like OS I've used would mean the
job in question only gets CPU cycles if no high-priority
processes are running. However, under Solaris, something
run at priority 19 still tends to get a 
not-insignificant share of CPU time - a typical number
is 15% on a system with one other full-priority job 
running. This has caused me to have to restrict my GIMPS
work to only our least-used (read: slowest) Sparcs,
since people running actual work-related jobs don't
want to be slowed down by a recreational-math program.

So here's my question: does anyone know of a solution
to the above conundrum? The first thing that came to my
mind was to run the code as a cron job, but's that's not
optimal since folks here tend to run jobs at any and all
hours, often overnight.

What I'm thinking of is inspired by the output of the
top utility - this lists (among other things) the
average job load of a system, and also what % of the
CPU time each active process is getting. (Since top is
available for Sparc Linux, that means source code is
available.) Would it be possible to write a script to
basically emulate the OS's process scheduler, but which
is used to control just one application? What I'd like
this personal scheduler to do is the following:

1) Every few seconds get the average system load, in
terms of how many CPUs would be required to run every
active process at full speed;

2) If the load exceeds the available number of CPUs,
suspend the user program, but keep everything in main
memory; otherwise, keep running.

Any such scheme, in order to be practical, would have
to be executable without superuser privileges.

Thanks,
- -Ernst

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sat, 31 Mar 2001 08:09:52 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: problems with Solaris nice

On 30 Mar 2001, at 15:53, [EMAIL PROTECTED] wrote:

> I run Mlucas on several Sparcs under Solaris at work,
> and I've noticed something not so nice about the Solaris
> nice command.

I never did find out why the (unix) command was called "nice". My 
theory is that a neolithic unix hacker wanted to show off the latest 
tool he'd written, but couldn't think of a name for. On being shown 
the action of this tool, his colleague murmured "nice"!

Should the Solaris "nice" command should be renamed "ugly"?

> The lowest priority Solaris allows is 19,

I believe this is standard unix. It may be that linux et al are 
diverging here.

> which on any other Unix-like OS I've used would mean the
> job in question only gets CPU cycles if no high-priority
> processes are running.

Well, that's what the documentation leads you to expect ... For 
reasons like making sure that interactive processes are controllable 
e.g. by mouse commands, there is always _some_ leakage of CPU cycles, 
but the proportion needn't be anything other than very small.

> processes are running. However, under Solaris, something
> run at priority 19 still tends to get a 
> not-insignificant share of CPU time - a typical number
> is 15% on a system with one other full-priority job 
> running.

I sort of didn't really believe this until I checked it. But you're 
right!

> This has caused me to have to restrict my GIMPS
> work to only our least-used (read: slowest) Sparcs,
> since people running actual work-related jobs don't
> want to be slowed down by a recreational-math program.

Sensible attitude ... doesn't bother me much on the two Solaris 
systems I have access to, since the other jobs running tend to be I/O 
bound rather than compute bound.
> 
> So here's my question: does anyone know of a solution
> to the above conundrum? The first thing that came to my
> mind was to run the code as a cron job, but's that's not
> optimal since folks here tend to run jobs at any and all
> hours, often overnight.

Check out the Solaris priocntl command. I think you will find that 
the default range of priority adjustment is 60 each way. With 
priocntl you should be able to fix the Mlucas process so that it 
adjusts much less - 10 each way should be enough. This will prevent 
the scheduling priority rising enough so that anyone who has another 
compute-bound process is losing a signficant number of CPU cycles.

You may also want to use ls -ful <username> to see the way in which 
the scheduling priority, the nice value and the processor share are 
correlated. (You might want to try this on a near-empty system with 
another CPU soak program running un-niced in the foreground to 
simulate the loading imposed by real users.)

I haven't actually tried this & may be wrong, but even if I have the 
details wrong there's probably some way of using priocntl to throttle 
a process so that it doesn't steal a significant number of CPU cycles 
on a compute-bound system.
> 
> What I'm thinking of is inspired by the output of the
> top utility - this lists (among other things) the
> average job load of a system, and also what % of the
> CPU time each active process is getting. (Since top is
> available for Sparc Linux, that means source code is
> available.) Would it be possible to write a script to
> basically emulate the OS's process scheduler, but which
> is used to control just one application? What I'd like
> this personal scheduler to do is the following:

I think it would be possible, but it may be unneccessary.
> 
> 1) Every few seconds get the average system load, in
> terms of how many CPUs would be required to run every
> active process at full speed;
> 
> 2) If the load exceeds the available number of CPUs,
> suspend the user program, but keep everything in main
> memory; otherwise, keep running.

Oops, this is _silly_. If you run a compute-bound process on an 
otherwise totally unloaded system, the load average will be 1.0. So 
your algorithm will thrash the process up & down.

I think the load average limit for starting the process should be 
about one half less than the number of processors - say 0.5 on a 
uniprocessor system - and the load average limit for stopping it 
should be a little less than one more than the number of processors - 
say 1.8 on a uniprocessor system.

The induced hysteresis will prevent the controlled process thrashing 
up & down yet still force it out of the way when something else wants 
to use the CPU capacity in a serious way.
> 
> Any such scheme, in order to be practical, would have
> to be executable without superuser privileges.

Shouldn't be a problem. Start the script by looking for the Mlucas 
process id. Then, in a loop: sleep for n seconds; check if the 
process still exists (exit if not); check the load average; restart 
or suspend as neccessary; repeat until hell freezes over. Run the 
script as a detatched process using nohup &; it can safely run at 
normal nice/priority since its resource usage will be minimal. 
Ordinary users are (normally) allowed to control their own processes 
(within reason), so I don't see that root privelege should be 
required.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sat, 31 Mar 2001 22:34:32 -0500
From: Nathan Russell <[EMAIL PROTECTED]>
Subject: Re: Mersenne: problems with Solaris nice

On Saturday 31 March 2001 03:09, Brian J. Beesley wrote:
> On 30 Mar 2001, at 15:53, [EMAIL PROTECTED] wrote:

> > processes are running. However, under Solaris, something
> > run at priority 19 still tends to get a
> > not-insignificant share of CPU time - a typical number
> > is 15% on a system with one other full-priority job
> > running.
>
> I sort of didn't really believe this until I checked it. But you're
> right!

Under Linux, it is only slightly better.  (note, however, that the 'other' 
job is not a particularly kind one in CPU terms!)

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 5528 nathan    14   0   464  464   376 R    85.0  0.3   2:47 yes
  200 nathan    19  19 11720  11M   336 R N   5.3  8.9  1436m mprime
 4126 root       0   0 32860  32M  2180 S     4.9 25.1   4:11 X
 5467 nathan     0   0 14452  14M 10572 S     3.0 11.0   0:08 kmail
 4262 nathan     0   0  9552 9552  7888 S     0.4  7.3   0:17 kdeinit
 4488 nathan     0   0  4524 4524  3440 S     0.3  3.4   0:28 gaim
 4277 nathan     0   0  7140 7140  6360 S     0.2  5.4   0:13 kdeinit
 5514 nathan     1   0   920  920   696 R     0.2  0.7   0:00 top

> > This has caused me to have to restrict my GIMPS
> > work to only our least-used (read: slowest) Sparcs,
> > since people running actual work-related jobs don't
> > want to be slowed down by a recreational-math program.
>
> Sensible attitude ... doesn't bother me much on the two Solaris
> systems I have access to, since the other jobs running tend to be I/O
> bound rather than compute bound.

I wish the admins here would allow distributed computing - they say that 
whenever the machine is slow they were getting flames about the distributed 
client 'raising the load average'.  

(big snip)

> I think the load average limit for starting the process should be
> about one half less than the number of processors - say 0.5 on a
> uniprocessor system - and the load average limit for stopping it
> should be a little less than one more than the number of processors -
> say 1.8 on a uniprocessor system.

I don't know much about such things, but I would note that I was working 
quite comfortably during the above test with load averages in the 2.3 range.  
However, the typical 'uniprocessor system' is not a P3 with a single user 
typing email!  

> Regards
> Brian Beesley

Nathan
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 1 Apr 2001 03:02:00 -0700 (PDT)
From: Francois Gouget <[EMAIL PROTECTED]>
Subject: Re: Mersenne: problems with Solaris nice

On Fri, 30 Mar 2001 [EMAIL PROTECTED] wrote:
[...]
> 1) Every few seconds get the average system load, in
> terms of how many CPUs would be required to run every
> active process at full speed;
> 
> 2) If the load exceeds the available number of CPUs,
> suspend the user program, but keep everything in main
> memory; otherwise, keep running.
> 
> Any such scheme, in order to be practical, would have
> to be executable without superuser privileges.


   Try loadwatch. There's a package for it on Debian called...
loadwatch. I don't know of a similar package for Solaris but you can get
a loadwatch clone that I wrote based on source code by G. Garonni. It's
called loadwatcher and you can get the source code on my home page. I
tested it on Solaris so it should pretty much compile an run just fine.

   http://fgouget.free.fr/distributed/index-en.shtml#h3

- --
Francois Gouget         [EMAIL PROTECTED]        http://fgouget.free.fr/
           Cahn's Axiom: When all else fails, read the instructions.

_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Sun, 1 Apr 2001 10:22:31 -0000
From: "Brian J. Beesley" <[EMAIL PROTECTED]>
Subject: Re: Mersenne: problems with Solaris nice

On 31 Mar 2001, at 22:34, Nathan Russell wrote:

> Under Linux, it is only slightly better.  (note, however, that the
> 'other' job is not a particularly kind one in CPU terms!)

Just checked. Kernel 2.4 behaves just like Solaris. Kernel 2.2 
doesn't, a "nice -n20"'d CPU-bound process gets hardly any CPU time 
whilst there are CPU-bound processes running at normal priority.

Telling people not to upgrade to kernel 2.4 simply isn't on. We'll 
have to find a solution to this.

The ideal solution would be to incorporate mprime as the null 
process, but for various reasons this isn't practical.

The upside of kernel 2.4 is that mprime seems to run a little faster 
on an otherwise idle system - around 1% or 2% - of course, anything 
helps!
> 
> I wish the admins here would allow distributed computing - they say
> that whenever the machine is slow they were getting flames about the
> distributed client 'raising the load average'.  

The flames are fed by three things: ignorance, stupidity and nothing 
else. Of course the load average rises! The point is that you should 
be able to run a huge number of low-priority CPU soak programs 
without affecting the apparent response of the system to interactive 
users.

Anyone else wanting to run a background CPU soak program is naturally 
going to be affected.
> 
> (big snip)
> 
> > I think the load average limit for starting the process should be
> > about one half less than the number of processors - say 0.5 on a
> > uniprocessor system - and the load average limit for stopping it
> > should be a little less than one more than the number of processors
> > - say 1.8 on a uniprocessor system.
> 
> I don't know much about such things, but I would note that I was
> working quite comfortably during the above test with load averages in
> the 2.3 range.  However, the typical 'uniprocessor system' is not a P3
> with a single user typing email!  

The load average is simply the average over the last interval of the 
number of computable process threads on the system - irrespective of 
priority. If you do "cat /proc/loadavg" (on a linux system; Solaris 
seems not to have this capability) the first three numbers are the 
load average over the last one minute, five minutes and one hour 
respectively.

Processes instantaneously at a lower priority (higher priority 
number) are irrelevant. Therefore the load average on its own is a 
poor indicator of the system's capability to do work, especially when 
the actual work is interactive or I/O bound rather than compute 
bound.

If the one minute load average is less than the number of processors, 
you have wasted some cycles during the last minute; you could 
consider starting an extra CPU soak process. If you have one too many 
CPU soak processes running, the load average will be around one more 
than the number of processors, even if there is nothing else active. 
Possibly a bit less than N+1, since worthwhile CPU soak processes are 
rarely perfect - they do tend to do at least some I/O. 

The paragraph above shows reasoning behind my suggestion; the values 
I give are on the safe side. Note that it's important that the 
difference between the "start" and "stop" thresholds is greater than 
1, otherwise thrashing is likely.


Regards
Brian Beesley
_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

Date: Mon, 02 Apr 2001 13:27:47 EDT
From: [EMAIL PROTECTED]
Subject: Mersenne: reconfigurable MP processor

This is interesting:

http://www.cnn.com/2001/TECH/ptech/03/30/langley.supercomputer/index.html

I wonder what the power dissipation for such a beast is?
Sure, each FPU one creates within the programmable logic
may run at only ~100-200 MHz (typical numbers for
high-end programmable logic), but if one has a thousand
of them, that's a *lot* of watts, especially since a
typical programmable gate draws an order of magnitude
more power than fixed logic. Still, if they've got a
working prototype, they must have handled that problem.

- -Ernst


_________________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

------------------------------

End of Mersenne Digest V1 #835
******************************
Mersenne Digest V1 #835

Reply via email to