Introduction to Task Migration, LD and DSM on top of Mach

Farid Hajji Sat, 26 Aug 2000 12:32:18 -0700
Hello Bartek,

[help-hurd Cc-ed: for general information / documentation]

this is a small introduction to the aspects on TM, LD and DSM on top of Mach.
Depending on your previous knowledge, you may need to read additional papers.
This is only a bird's view on the topic but it may be enough to get you
started. If you have any questions, ideas or critiques, please mail them
directly to me.

There are basically two kinds of multiprocessors:
  * strongly coupled muliprocessors where the processors share the same
    physical memory (let' call this a multiprocessor system)
  * loosely coupled multiprocessors consisting of independent workstations,
    each of them with own memory and running an independent kernel (let's call
    this a multicomputer).

Mach already supported multiprocessors for a long time. In fact, threads
were distributed among all available physical processors in a uniform
way, utilizing as much CPU power as possible. Task migration or LD is
_not_ an open issue here, because everyting is done in the Mach scheduler
and no data has to be moved around (shared physical memory).

On multicomputers, the situation changes radically. Because each node
has independent physical memory, you have to move a task from one machine
to the other, in order to have its threads run on the processor(s) of
the target machine. How to move a task is the issue of Task Migration.
This depends heavily on distributed shared memory (DSM). Where and when
to move a task depends on the current load of the machines. This is handled
by load distribution (LD).

I won't go into the reasons for providing TM and LD in general. This is
IMHO pretty obvious ;-)

There are basically two strategies for LD:
  * Initial placement: In this context, the task is transferred to the
    node designed by LD and then started there. The task won't be moved
    after this.
  * Task migration: Here, the _running_ task can be frozen and moved to
    the new node, unfrozen and then resumed. Tasks can dynamically change
    their location on the multicomputer, depending on the current load
    condition of the individual nodes.

Needless to say, that TM is much more difficult to implement than initial
placement. The gained flexibility is however often worth the effort.

There were many attempts to implement process migration in Unix systems.
They all failed because a Unix process has a lot of state that is deeply
buried inside the Unix kernel. Extracting this state on the source node
and inserting it in the target node required substancial modifications to
the Unix kernel. OTOH, Mach tasks don't have the rich semantics of a
traditional Unix process. Only the following aspects of a task need to
be taken into account for migration:
  * The processor(s) state for the threads [a set of registers]
  * The virtual address space as provided by the VM subsystem
  * the IPC port space provided by Mach
Everything else is realized in external OS-Servers like the Hurd servers.

Moving tasks is greatly simplified by the NORMA-IPC and XMM features of
the NORMA version of Mach. What's NORMA? It's basically an extension of
the Mach port concept. A port is augmented with the address of a node.
Using appropriate priviledges, one can hold send/so/receive rights to
ports that are located on another node!

With NORMA Mach, one can e.g. RPC servers that are located on remote
nodes. There are quite a lot of potential applications to this. Imagine
e.g. running an ext2fs server on a remote node: You quickly get an NFS
substitute for free.

One of the most intriguing and fascinating aspects of Mach is its external
memory management mechanism. Unlike traditional kernels, Mach communicates
with an external pager through IPC calls to provide virtual memory. With
NORMA-IPC, the external pager doesn't need to run on the same machine
as the requesting kernel! In other words, you can request/submit memory
pages from/to a remote machine. This is the first step towards DSM.

The second important step to DSM is to provide a synchronization mechanism
between the involved kernels that concurrently access a page. One of the
most used mechanisms to provide concistency is the "One writer, many
readers" paradigm. This was implemented on top of NORMA Mach and is
called XMM (eXternal Memory Management).

With the help of NORMA-IPC and XMM, task migration becomes relatively
easy to implement. A working DSM enables us to map the address space of
the migrated task into another node and NORMA-IPC (+ some minor extensions
to the kernel itself) provides ways to redirect ports to other nodes.

Caveats:
  * NORMA-IPC and XMM are not supported in GNUmach. If you want to do some
    tests, you'll have to use either CMU Mach 3.0 (MK83a) with the Hurd,
    or do some tests in the Lites/RT-Mach environment provided by Keio/NTT.
  * The NORMA/XMM version provided by MK83<?> and RT-Mach is useable, but
    somehow broken. OSF-RI managed to fix the NORMA part but the support
    for XMM was dropped for some reason. We'll have to contact OSF to get
    the current NORMA implementation and think about designing an XMM
    substitute that would fix the XMM problems.
  * Task migration and LD is a complex task and is associated with many
    pitfalls. The initial enthusiasm among researchers has somewhat cooled
    down, as other problems became apparent. I'm nonetheless interested to
    get that TM and LD to the Hurd, so that we can further investigate it.
  * One important issue is to add redundancy on top of TM/LD, effectively
    distributing user- and Hurd tasks among the nodes of the multicomputer,
    so that individual node shutdown or crashes won't affect the overall
    system performance. Redundancy is yet another area of research that
    is still harder than TM/LD but that is also very interesting.

If you're really interested in the issue of TM, you should read the
important work of Dejan Milijocic. An _abbridged_ version of his PhD
thesis can be found in:

  
http://www.usenix.org/publications/library/proceedings/sedms4/full_papers/milojicic.txt

A more complete bibliography can be found in the book with his full
dissertation [The book also contains figures and more informations]

  Load Distribution: Implementation for the Mach Microkernel
  Dejan Milojicic
  Vieweg Advanced Studies in Computer Science, Vieweg
  ISBN 3-528-05424-7, 1993

I'd suggest that you also read some papers about Mach design issues in
general and NORMA-IPC and XMM in particular. Documents about Mach are
the generic OSF Docs as well as a lot of papers. A list of URLs was already
posted to this list a few days ago. I can mail it to you separatly if
you miss it. If you want, I can also mail you _some_ of these papers directly,
since I've lost track of their URL...

-Farid.

-- 
Farid Hajji -- Unix Systems and Network Admin | Phone: +49-2131-67-555
Broicherdorfstr. 83, D-41564 Kaarst, Germany  | [EMAIL PROTECTED]
- - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - -
Murphy's Law fails only when you try to demonstrate it, and thus succeeds.
Introduction to Task Migration, LD and DSM on top of Mach

Reply via email to