Re: [gt-user] excessive latency

Ioan Raicu Thu, 24 Jul 2008 07:33:59 -0700

Here is the pointer to Falkon:
http://dev.globus.org/wiki/Incubator/Falkon

BTW, with Falkon, 10K short running tasks can be completed in a matterof a few seconds (1~10 sec), given a fast enough file system and enoughprocessors! There is of course the overhead to allocate the initialresources, which is usually on the order of 10~60 seconds if theresources are available, but this is a one time cost.


Ioan

Charles Bacon wrote:

On Jul 24, 2008, at 3:18 AM, Hans-Martin Adorf wrote:
We once had a "sampling" use case that would have required to submit
10,000 or more short-running jobs to the grid, in bunches of 100 or so
at the same time, but refrained from even trying to implementit due to
the latency issues discussed in this thread.
This is the kind of use-case that Ioan Raicu has been writing about.The standard grid way to deal with this is to run what are called"pilot jobs". Instead of submiting 100 short running jobs, you submitsome small number of jobs, and leave them as longer-running controlnodes that you feed your shorter jobs to. Ioan's particularimplementation is Falkon, but people have also done this kind of thingwith Condor-G glide-ins and other technologies.
It's a good way of reducing your scheduling startup costs, and is inuse by several production grids.
Charles


--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================

Re: [gt-user] excessive latency

Reply via email to