> there appear to be some overlaps between the ls_* and lsb_* functions,
> but they seem basically compatible as far as i can tell. almost all > the functions have a command line version as well, for example: > lsb_submit()/bsub Like openmpi and orte, there are two layers in LSF. The ls_* API's talk to what is/was historically called "LSF Base" and the lsb_* API's talk to what is/was historically called "LSF Batch". The ls_* API's are essentially "do it now" type functionality for writing distributed applications that do not require batch functionality. The ls_* functions do not honour any batch allocation or policy in any shapre > lsb_getalloc()/none and lsb_launch()/blaunch are new with LSF 7.0, but > appear to just be a different (simpler) interface to existing > functionality in the LSB_* env vars and the ls_rexec()/lsgrun commands > -- although, as you say, perhaps platform will hook or enhance them > later. but, the key issue is that lsb_launch() just starts tasks -- it > does not perform or interact with the queue or job control (much?). > so, you can't use these functions to get an allocation in the first > place, and you have to be careful not to use them as a way around the > queuing system. ls_* api's do not honour a batch allocation, while lsb_launch does. lsb_launch will only allow you to start tasks on nodes allocated to your jobs, and is subject to all the queue/job controls. ls_rexec/lsgrun are not used to start batch jobs In pre-7.0, the method for starting openmpi is essentially: $bsub -n N -a openmpi mpirun.lsf a.out Note that you only have the openmpi method and mpirun.lsf if you have installed the hpc extensions. > [ as a side note, the function ls_rexecv()/lsgrun is the one i have > heard admins do not like because it can break queuing/accounting, and > might try to disable somehow. i don't really buy that, because it's > not you can disable it and have the system still work, since (as > above) || job launching depends on it. i guess if you really don't > care about || launching maybe you could. but, if used properly after a > proper allocation i don't think there should (or even can) be a > problem. ] Job launching does not depend on it; and admins can explicitly turn off support for ls_rexec/lsgrun while allowing lsb_launch to continue to function -- thus ensuring that tasks of any form can only be started on nodes allocated to the job. > so, lsb_submit()/bsub is a combination allocate/launch -- you specify > the allocation size you want, and when it's all ready, it runs the > 'job' (really the job launcher) only on one (randomly chosen) 'head' > node from the allocation, with the env vars set so the launcher can > use ls_rexec/lsgrun functions to start the rest of the job. there are > of course various script wrappers you can use (mpijob, pvmjob, etc) > instead of your 'real job'. then, i think lsf *should* try to track > what processes get started via the wrapper / head process so it knows > they are part of the same job. i dunno if it really does that -- but, > my guess is that at the least it assumes the allocation is in use > until the original process ends. in any case, the wrapper / head > process examines the environment vars and uses ls_rexec()/lsgrun or > the like to actually run N copies of the 'real job' executable. in > 7.0, it can conveniently use lsb_getalloc() and lsb_launch(), but that > doesn't really change any semantics as far as i know. one could > imaging that calling lsb_launch() instead of ls_rexec() might be > preferable from a process tracking point of view, but i don't see why > Platform couldn't hook ls_rexec() just as well as lsb_launch(). ls_rexec does not honour batch semantics. Prior to LSF7 there is an additional parallel application manager that is started when the -a openmpi option is specified. It handles I/O marshalling, signaling and task accounting for the complete parallel job across all nodes. In LSF7, this functionaly has been embedded directly into the RES daemon and is invoked when lsb_launch is used. yes you could use ls_rexec but it does not handle the I/O and process marshalling - you need to handle that yourself if you use ls_rexec. The first node is node random, it is the "best" match within the allocation based on the resource requirements for the job Since you are refering to the mpijob/pvmjob scripts I would guess you do not have the HPC extensions installed, as these are fairly simplistic wrappers that don't make use of the parallel application manager. > there is also an lsb_runjob() that is similar to lsb_launch(), but for > an already submitted job. so, if one were to lsb_sumbit() with an > option set to never launch it automatically, and then one were to run > lsb_runjob(), you can avoid the queue and/or force the use of certain > hosts? i guess this is also not a good function to use, but at least > the queuing system would be aware of any bad behavior (queue skipping > via ls_placereq() to get extra hosts, for instance) in this case ... Not really - lsb_runjob() is essentially an admin function to force a job to run irrespective of the current policies/priorities/allocations. Unless you have administrator privs it will fail. As for growing or shrinking the allocation for a job, that is on the the roadmap for the near future. However, as Jeff has previously mentioned, on a busy system you could end up waiting for a long time to get additional nodes. Essentially it boils down to make an asynchronous request for additional resources and registering a callback for when something can be allocated. Regards, Bill ------------- Bill McMillan Principal Technical Product Manager Platform Computing