There is a new Globus Incubator Project called Falkon whose goal is
exactly this, how to handle many small jobs efficiently. The net result
is a few orders of magnitude better performance than with existing
methods, and it should be able to handle job sizes of 1 second long
efficiently (95%+) while managing 100s of processors.

Here are a few links:
Web: http://dev.globus.org/wiki/Incubator/Falkon
Paper:
http://people.cs.uchicago.edu/~iraicu/research/docs/Falkon/Falkon_SC07_v42.pdf
Code: svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon

At the web above, there are mailing lists you can join, slides you can
look through, other relevant papers, instructions on how to setup and
run Falkon, and any of our branching work that is related to Falkon.

Cheers,
Ioan

Jan Ploski wrote:
> "李辉" <[EMAIL PROTECTED]> schrieb am 11/12/2007 10:54:08 AM:
>   
>> Indeed,we are going to do the work you mentioned that packaging 
>> small jobs into a “big” Jobs. But It is only designed for a special 
>> applications. Is it possilbe to implement a more general componet 
>> which package samll jobs into “big” jobs,and then submit the “big”
>> jobs to target site with GRAM ? When the LRM (Local resource 
>> management,like openbps,torque) receive the packaged “big” Jobs,the 
>> C application or Scripts on the target sites unpackage them into 
>> small jobs again. Then these samll jobs will be handled by 
>> openPbs(jobs may be stored in the job queues) or muti-threads 
>> program on target sites. 
>> I do not know weather this is a good idea. Does anybody have do some
>> research on this problem,or is there some published papers about them ?
>>     
>
> I'm preparing a paper which describes a convenient solution which can
> be implemented by job submitters. Some slides are available:
> https://bi.offis.de/wisent/tiki-download_file.php?fileId=656
>
> The "general component" mentioned within the presentation is also 
> available:
> https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-BigJobs
>
> The page is geared towards Condor users, but my MultiJob.pm module is
> not in any way Condor-specific. You still have to "package small jobs
> into big jobs" in an application-specific manner, but the module takes
> care of all the synchronization required to run the "big job" at the
> target site.
>
> In the longer term, it would be nice to have this functionality in Globus.
> AFAICS, the current implementation of Globus multijobs doesn't cut it (or 
> is
> just not documented well enough): I found no way to describe an atomic
> multijob consisting of a single-processor job, followed by a 
> multi-processor
> job, followed by a single-processor job at the same site.
>
> Regards,
> Jan Ploski
>   

Reply via email to