Interesting. I thought that's the main reason why anyone would need a
grid platform: to distribute problems to multiple computers. If GT5 is
not doing that, then what is it doing?
The quote below is from the documentation of GT5.2.2[1]:"The Grid
Resource Allocation and Management (GRAM5) component is used to
locate, submit, monitor, and cancel jobs on Grid computing resources.
GRAM5 is not a Local Resource Manager, but rather a set of services
and clients for communicating with a range of different batch/cluster
job schedulers using a common protocol. GRAM5 is meant to address a
range of jobs where reliable operation, stateful monitoring,
credential management, and file staging are important."
>From my understanding, GRAM5 (part of GT5) aims at handling a range of
jobs (i.e. not a single job) + monitoring..etc. I assume that since it
claims handling a range of jobs, it should somehow figure out which
nodes to distribute at.
I think my expectations about GT5 are incorrect. Perhaps I am missing
the key objectives of GT5. Any clarifications would be appreciate it.
Regards,
J
On 11/19/2012 at 4:38 AM, "Steven C Timm" wrote:
The GT4 globus toolkit did include an implementation of the
Monitoring and Discovery Service, which can be used by a number of
sites to advertise to some central service which could then tell the
user where to globus-job-submit (or globusrun-ws –submit as GT4
did.)
In practice most production grids have some other non-globus method
of telling the user which sites are available and now many free
Slots that they have. Most common one is the BDII. The Open Science
Grid in the US uses that, but also uses software known
As the GlideinWMS to present the whole grid as a single unified
resource to users.
Steve Timm
From: [email protected]
[mailto:[email protected]] On Behalf Of
[email protected]
Sent: Sunday, November 18, 2012 5:52 PM
To: gt-user
Subject: [gt-user] How to distribute problems to multiple resources
(computers)?
Greetings GT community,
Suppose that a pool of computers are able to donate their idle CPU
time, how can a problem (i.e. an piece of code) get executed in them
in a distributed manner?
For example, when I use the command globus-job-submit, or
globus-job-run, how will my local machine know where should these jobs
to be submitted?
I'm expecting that every resource should register itself to a
discovery data base (service) that is hosted on a server(s). And that
grid users (e.g. programmers/researchers) submit problems, they
submit it somewhere that will dispatch them to multiple resources (CPU
donators) according to a scheduler and an execution management plan
that decies what to do in case of a failure.
However, I fail to see how the above thoughts map to GT5 after
following my reading of the quick start guide in
http://www.globus.org/toolkit/docs/5.2/5.2.2/admin/quickstart/ --
what is in the guide is pretty controlled by the user/programmer (e.g.
he specifies which computer to execute which commands on).
Rgrds,
J