Re: [OMPI devel] Fwd: lsf support / farm use models

Bill McMillan Tue, 17 Jul 2007 23:01:33 -0400

> there appear to be some overlaps between the ls_* and lsb_* functions,


> but they seem basically compatible as far as i can tell. almost all 
> the functions have a command line version as well, for example: 
> lsb_submit()/bsub 

  Like openmpi and orte, there are two layers in LSF.  The ls_* API's
  talk to what is/was historically called "LSF Base" and the lsb_* API's
  talk to what is/was historically called "LSF Batch".

  The ls_* API's are essentially "do it now" type functionality for
  writing distributed applications that do not require batch
functionality.
  The ls_* functions do not honour any batch allocation or policy in
  any shapre 
 
> lsb_getalloc()/none and lsb_launch()/blaunch are new with LSF 7.0, but

> appear to just be a different (simpler) interface to existing 
> functionality in the LSB_* env vars and the ls_rexec()/lsgrun commands

> -- although, as you say, perhaps platform will hook or enhance them 
> later. but, the key issue is that lsb_launch() just starts tasks -- it

> does not perform or interact with the queue or job control (much?). 
> so, you can't use these functions to get an allocation in the first 
> place, and you have to be careful not to use them as a way around the 
> queuing system. 

  ls_* api's do not honour a batch allocation, while lsb_launch does.
  lsb_launch will only allow you to start tasks on nodes allocated to
  your jobs, and is subject to all the queue/job controls.

  ls_rexec/lsgrun are not used to start batch jobs

  In pre-7.0, the method for starting openmpi is essentially:

  $bsub -n N -a openmpi mpirun.lsf a.out

  Note that you only have the openmpi method and mpirun.lsf if you have
  installed the hpc extensions.

> [ as a side note, the function ls_rexecv()/lsgrun is the one i have 
> heard admins do not like because it can break queuing/accounting, and 
> might try to disable somehow. i don't really buy that, because it's 
> not you can disable it and have the system still work, since (as 
> above) || job launching depends on it. i guess if you really don't 
> care about || launching maybe you could. but, if used properly after a

> proper allocation i don't think there should (or even can) be a 
> problem. ] 

  Job launching does not depend on it; and admins can explicitly
  turn off support for ls_rexec/lsgrun while allowing lsb_launch to 
  continue to function -- thus ensuring that tasks of any form can only
  be started on nodes allocated to the job.

> so, lsb_submit()/bsub is a combination allocate/launch -- you specify 
> the allocation size you want, and when it's all ready, it runs the 
> 'job' (really the job launcher) only on one (randomly chosen) 'head' 
> node from the allocation, with the env vars set so the launcher can 
> use ls_rexec/lsgrun functions to start the rest of the job. there are 
> of course various script wrappers you can use (mpijob, pvmjob, etc) 
> instead of your 'real job'. then, i think lsf *should* try to track 
> what processes get started via the wrapper / head process so it knows 
> they are part of the same job. i dunno if it really does that -- but, 
> my guess is that at the least it assumes the allocation is in use 
> until the original process ends. in any case, the wrapper / head 
> process examines the environment vars and uses ls_rexec()/lsgrun or 
> the like to actually run N copies of the 'real job' executable. in 
> 7.0, it can conveniently use lsb_getalloc() and lsb_launch(), but that

> doesn't really change any semantics as far as i know. one could 
> imaging that calling lsb_launch() instead of ls_rexec() might be 
> preferable from a process tracking point of view, but i don't see why 
> Platform couldn't hook ls_rexec() just as well as lsb_launch(). 

  ls_rexec does not honour batch semantics.  Prior to LSF7 there is
  an additional parallel application manager that is started when the
  -a openmpi option is specified.  It handles I/O marshalling, signaling
  and task accounting for the complete parallel job across all nodes.
  In LSF7, this functionaly has been embedded directly into the RES
daemon
  and is invoked when lsb_launch is used.

  yes you could use ls_rexec but it does not handle the I/O and process
  marshalling - you need to handle that yourself if you use ls_rexec.

  The first node is node random, it is the "best" match within the
allocation
  based on the resource requirements for the job

  Since you are refering to the mpijob/pvmjob scripts I would guess
  you do not have the HPC extensions installed, as these are fairly
  simplistic wrappers that don't make use of the parallel application
  manager.

> there is also an lsb_runjob() that is similar to lsb_launch(), but for

> an already submitted job. so, if one were to lsb_sumbit() with an 
> option set to never launch it automatically, and then one were to run 
> lsb_runjob(), you can avoid the queue and/or force the use of certain 
> hosts? i guess this is also not a good function to use, but at least 
> the queuing system would be aware of any bad behavior (queue skipping 
> via ls_placereq() to get extra hosts, for instance) in this case ... 

  Not really - lsb_runjob() is essentially an admin function to force
  a job to run irrespective of the current
policies/priorities/allocations.
  Unless you have administrator privs it will fail.

  As for growing or shrinking the allocation for a job, that is on the
  the roadmap for the near future.  However, as Jeff has previously
  mentioned, on a busy system you could end up waiting for a long time
  to get additional nodes.

  Essentially it boils down to make an asynchronous request for
additional
  resources and registering a callback for when something can be
allocated.

  Regards,
  Bill


-------------
Bill McMillan
Principal Technical Product Manager
Platform Computing

Re: [OMPI devel] Fwd: lsf support / farm use models

Reply via email to