Hubertus wrote:
> 
> A minimal quote from your website :-)

Ok - now I see what you're saying.

Let me expound a bit on this line, from a different perspective.

While big NUMA boxes provide the largest available single system image
boxes available currently, they have their complications.  The bus and
cache structures and geometry are complex and multilayered.

For more modest, more homogenous systems, one can benefit from putting
CKRM controllers (I hope I'm using this term correctly here) on things
like memory pages, cpu cycles, disk i/o, and network i/o in order to
provide a fairly rich degree of control over what share of resources
each application class receives, and obtain both efficient and
controlled balance of resource usage.

But for the big NUMA configuration, running some of our customers most
performance critical applications, one cannot achieve the desired
performance by trying to control all the layers of cache and bus, in
complex geometries, with their various interactions.

So instead one ends up using an orthogonal (thanks, Hubertus) and
simpler mechanism - physical isolation(*).  These nodes, and all their
associated hardware, are dedicated to the sole use of this critical
application.  There is still sometimes non-trivial work done, for a
given application, to tune its performance, but by removing (well, at
least radically reducing) the interactions of other unknown applications
on the same hardware resources, the tuning of the critical application
now becomes a practical, solvable task.

In corporate organizations, this resembles the difference between having
separate divisions with their own P&L statements, kept at arms length
for all but a few common corporate services [cpusets], versus the more
dynamic trade-offs made within a single division, moving limited
resources back and forth in order to meet changing and sometimes
conflicting objectives in accordance with the priorities dictated by
upper management [CKRM].

 (*) Well, not physical isolation in the sense of unplugging the
     interconnect cables.  Rather logical isolation of big chunks
     of the physical hardware.  And not pure 100% isolation, as
     would come from running separate kernel images, but minimal
     controlled isolation, with the ability to keep out anything
     that causes interference if it doesn't need to be there, on
     those particular CPUs and Memory Nodes.

     And our customers _do_ want to manage these logically isolated
     chunks as named "virtual computers" with system managed permissions
     and integrity (such as the system-wide attribute of "Exclusive"
     ownership of a CPU or Memory by one cpuset, and a robust ability
     to list all tasks currently in a cpuset).  This is a genuine user
     requirement to my understanding, apparently contrary to Andrew's.

The above is not the only use of cpusets - there's also providing
a base for ports of PBS and LSF workload managers (which if I recall
correctly arose from earlier HPC environments similar to the one
I described above), and there's the work being done by Bull and NEC,
which can better be spoken to by representives of those companies.

-- 
                          I won't rest till it's the best ...
                          Programmer, Linux Scalability
                          Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
ckrm-tech mailing list
https://lists.sourceforge.net/lists/listinfo/ckrm-tech

Reply via email to