On Mon, 25 Jun 2007, sad...@gmx.net wrote:

I *assume* loose coupled jobs

Hmm, given Sun's supposed involvement in this project, I'm really surprised that there is nobody from Sun to explain this.

I don't use SGE anymore, but some years ago when I did I have worked on the integration of LAM/MPI; here's what I remember:

- loose integration: the batch job is given a file which contains the
  list of nodes and number of slots (=processes that can be run on
  each node). The scheduler knows that the resources are ocupied until
  the batch job finishes. SGE has no involvement in starting of the
  processes on remote nodes, the job should do everything by itself
  (f.e. by using rsh/ssh). The end of the batch script or maybe an
  early termination (f.e. for exceeded runtime) tells SGE that the job
  has ended and there is no effort from SGE to finish processes
  launched on remote nodes. Removing a running job means that signals
  are sent only to the process on the main node of the job; the job
  should take care by itself of propagating signals or cleaning up on
  remote nodes.

- tight integration: the batch job is given the same nodes file, but
  SGE expects the job to use SGE's own launch mechanism, which is
  based on NetBSD's rsh [1]. The SGE daemons on remote nodes then know
  about the processes that belong to the job and there is a SGE rsh
  connection allowed for each slot allocated to the job on that node.
  Upon termination of the job, SGE tries to kill all processes that
  belong to the job on all allocated nodes. To track the processes
  that belong to a job on a node, the daemon uses a pool of group IDs
  that are normally not used and then sets an additional group ID
  (setgroups(2)) on the the launched process(es) - this call is
  available only to 'root', so there is no way for user processes to
  escape (like creating a separate process group, etc.) and upon
  termination of the job all processes (included spawned ones) that
  are marked with the job-specific group are killed.

[1] There is currently some effort on integrating ssh as well, the problem being that the ssh daemon needs some modifications to allow SGE to obtain accounting information. There was also some talk about a TM-like API; unfortunately the progress in this area seems to be very slow, if there is any at all...

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: bogdan.coste...@iwr.uni-heidelberg.de

Reply via email to