Re: [gt-user] How to distribute problems to multiple resources (computers)?

Ian Foster Mon, 19 Nov 2012 06:29:51 -0800

Dear John:

GT is used for that purpose in large grid systems such as the Open Science 
Grid. GRAM provides uniform, secure, reliable job submission to a specific 
site, and for staging of data in and out. MDS provides for monitoring of site 
status. Various groups have used these components to implement so-called 
"meta-schedulers" that use site status information to determine where to submit 
jobs. See A Resource Management Architecture for Metacomputing Systems. K. 
Czajkowski, et al. Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies 
for Parallel Processing, pg. 62-82, 1998 for the architectural approach.


You asked specifically about the rather specialized sub-problem of distributing 
work to a large and presumably time-varying pool of volunteer computers. One 
could do that with GT, but BOINC is really optimized for that specific problem.

Ian.


On Nov 19, 2012, at 8:14 AM, [email protected] wrote:

> Interesting. I thought that's the main reason why anyone would need a grid 
> platform: to distribute problems to multiple computers. If GT5 is not doing 
> that, then what is it doing?
> 
> The quote below is from the documentation of GT5.2.2[1]:
> "The Grid Resource Allocation and Management (GRAM5) component is used to 
> locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM5 
> is not a Local Resource Manager, but rather a set of services and clients for 
> communicating with a range of different batch/cluster job schedulers using a 
> common protocol. GRAM5 is meant to address a range of jobs where reliable 
> operation, stateful monitoring, credential management, and file staging are 
> important."
> 
> From my understanding, GRAM5 (part of GT5) aims at handling a range of jobs 
> (i.e. not a single job) + monitoring..etc. I assume that since it claims 
> handling a range of jobs, it should somehow figure out which nodes to 
> distribute at.
> 
> I think my expectations about GT5 are incorrect. Perhaps I am missing the key 
> objectives of GT5. Any clarifications would be appreciate it.
> 
> Regards, 
> J
> 
> On 11/19/2012 at 4:38 AM, "Steven C Timm" <[email protected]> wrote:
> The GT4 globus toolkit did include an implementation of the Monitoring and 
> Discovery Service, which can be used by a number of sites to advertise to 
> some central service which could then tell the user where to 
> globus-job-submit (or globusrun-ws –submit as GT4 did.)
> 
> In practice most production grids have some other non-globus method of 
> telling the user which sites are available and now many free
> 
> Slots that they have.  Most common one is the BDII.  The Open Science Grid in 
> the US uses that, but also uses software known
> 
> As the GlideinWMS to present the whole grid as a single unified resource to 
> users.
> 
>  
> Steve Timm
> 
>  
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of [email protected]
> Sent: Sunday, November 18, 2012 5:52 PM
> To: gt-user
> Subject: [gt-user] How to distribute problems to multiple resources 
> (computers)?
> 
>  
> Greetings GT community,
> 
>  
> Suppose that a pool of computers are able to donate their idle CPU time, how 
> can a problem (i.e. an piece of code) get executed in them in a distributed 
> manner? 
> 
>  
> For example, when I use the command globus-job-submit, or globus-job-run, how 
> will my local machine know where should these jobs to be submitted?
> 
>  
> I'm expecting that every resource should register itself to a discovery data 
> base (service) that is hosted on a server(s). And that grid users (e.g. 
> programmers/researchers) submit problems, they submit it somewhere that will 
> dispatch them to multiple resources (CPU donators) according to a scheduler 
> and an execution management plan that decies what to do in case of a failure.
> 
>  
> However, I fail to see how the above thoughts map to GT5 after following my 
> reading of the quick start guide in 
> http://www.globus.org/toolkit/docs/5.2/5.2.2/admin/quickstart/ -- what is in 
> the guide is pretty controlled by the user/programmer (e.g. he specifies 
> which computer to execute which commands on).
> 
>  
> Rgrds,
> 
> J
>

Re: [gt-user] How to distribute problems to multiple resources (computers)?

Reply via email to