Dear John: GT is used for that purpose in large grid systems such as the Open Science Grid. GRAM provides uniform, secure, reliable job submission to a specific site, and for staging of data in and out. MDS provides for monitoring of site status. Various groups have used these components to implement so-called "meta-schedulers" that use site status information to determine where to submit jobs. See A Resource Management Architecture for Metacomputing Systems. K. Czajkowski, et al. Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62-82, 1998 for the architectural approach.
You asked specifically about the rather specialized sub-problem of distributing work to a large and presumably time-varying pool of volunteer computers. One could do that with GT, but BOINC is really optimized for that specific problem. Ian. On Nov 19, 2012, at 8:14 AM, [email protected] wrote: > Interesting. I thought that's the main reason why anyone would need a grid > platform: to distribute problems to multiple computers. If GT5 is not doing > that, then what is it doing? > > The quote below is from the documentation of GT5.2.2[1]: > "The Grid Resource Allocation and Management (GRAM5) component is used to > locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM5 > is not a Local Resource Manager, but rather a set of services and clients for > communicating with a range of different batch/cluster job schedulers using a > common protocol. GRAM5 is meant to address a range of jobs where reliable > operation, stateful monitoring, credential management, and file staging are > important." > > From my understanding, GRAM5 (part of GT5) aims at handling a range of jobs > (i.e. not a single job) + monitoring..etc. I assume that since it claims > handling a range of jobs, it should somehow figure out which nodes to > distribute at. > > I think my expectations about GT5 are incorrect. Perhaps I am missing the key > objectives of GT5. Any clarifications would be appreciate it. > > Regards, > J > > On 11/19/2012 at 4:38 AM, "Steven C Timm" <[email protected]> wrote: > The GT4 globus toolkit did include an implementation of the Monitoring and > Discovery Service, which can be used by a number of sites to advertise to > some central service which could then tell the user where to > globus-job-submit (or globusrun-ws –submit as GT4 did.) > > In practice most production grids have some other non-globus method of > telling the user which sites are available and now many free > > Slots that they have. Most common one is the BDII. The Open Science Grid in > the US uses that, but also uses software known > > As the GlideinWMS to present the whole grid as a single unified resource to > users. > > > Steve Timm > > > > From: [email protected] > [mailto:[email protected]] On Behalf Of [email protected] > Sent: Sunday, November 18, 2012 5:52 PM > To: gt-user > Subject: [gt-user] How to distribute problems to multiple resources > (computers)? > > > Greetings GT community, > > > Suppose that a pool of computers are able to donate their idle CPU time, how > can a problem (i.e. an piece of code) get executed in them in a distributed > manner? > > > For example, when I use the command globus-job-submit, or globus-job-run, how > will my local machine know where should these jobs to be submitted? > > > I'm expecting that every resource should register itself to a discovery data > base (service) that is hosted on a server(s). And that grid users (e.g. > programmers/researchers) submit problems, they submit it somewhere that will > dispatch them to multiple resources (CPU donators) according to a scheduler > and an execution management plan that decies what to do in case of a failure. > > > However, I fail to see how the above thoughts map to GT5 after following my > reading of the quick start guide in > http://www.globus.org/toolkit/docs/5.2/5.2.2/admin/quickstart/ -- what is in > the guide is pretty controlled by the user/programmer (e.g. he specifies > which computer to execute which commands on). > > > Rgrds, > > J >
