Re: [gt-user] Globus Internal Architecture

Martin Feller Mon, 19 Jan 2009 10:58:20 -0800

Inderpreet Chopra wrote:

Thanks Martin for your help and support.
- According to my openion, communication through local scheduler can beproblem, if the scheduler dies before handing the jobs state to theGRAM. In this way, the jobs will be executed, but the client will remainthinking that the jobs are getting executed or are timed out.


Yes, but as far as i know, Gram was designed to be an interface to a local
resource manager. I think you'd have to duplicate a lot functionality and
complexity already provided by an LRM if you wanted to monitor jobs the way
you suggest.
What if your monitoring daemons on one of the cluster nodes fail, or die?
I think at some point you'll always have to deal with failures, and leveraging
the monitoring capabilities of the existing LRM's doesn't sound like a bad
compromise to me.

- Another thing you mentioned is that GRAM instance is no longeravailable in grid. That i am not able to understand. Can you pleasethrow some more detail regarding this.


A Grid can consist of more than one site, e.g. 5 machines, located in
different networks run be different resource providers, all offering
services like Gram, RFT, GridFTP, MDS, etc.
If one of them goes down the Grid itself is not dead, there are still
the other 4 resources available. Just this particular GRAM instance
is no longer available for use by clients.

- Martin, Please correct me if i am wrong in following flow: (this canbe most generic flow for jobs submission to completion)
Java WS Core --> GRAM ---> Scheduler----> Job Execution nodes


Yes, looks ok to me.

So here WS Core can provide the interface or a way for the client tosubmit the job.

GRAM, helps in submitting, monitoring and canceling the jobs.
Scheduler (PBS, Condor, etc) to schedule the jobs to different nodes.

- There is one component called MDS. Where that fits into the above flow.


ws-gram registers information about the supported local resource managers
in WS-MDS. A client can get certain information about a cluster interfaced
by ws-gram by looking up information in MDS.
For more information i suggest this link:
http://www.globus.org/toolkit/docs/4.2/4.2.1/info/#info

- I want to start implementing my ideas, to further improve the globusprocesses. So, How to set the devlopment enviornment under my eclipse.Is there some tutorial related to that. I have read, Programmers guidefor globus, but find that it gives us only information necessary tobuild the webservice.While studying, i found that there is eclipse plugin GDTE . But i am notsure that will i be able to build new component on the existing globuscomponent (esp. over GRAM) using this.
What will be the correct way to start toying with the existing components.


I guess there is not something like one correct way. I personally use Eclipse,
but you can also use vi.
I do it by creating a CVS project, using ws-gram as module, adding all Java
source directories to the source in the "Java build Path" and adding a user
library that contains all GT jars.

Martin


Thanks & Regards
Inderpreet
Research Scholar
TU, Patiala

On Thu, Jan 15, 2009 at 4:48 AM, Martin Feller <[email protected]<mailto:[email protected]>> wrote:


    Inderpreet Chopra wrote:

        Thanks Martin.
        Please see my comments inline.

        On Wed, Jan 14, 2009 at 2:49 AM, Martin Feller
        <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>> wrote:

           Inderpreet Chopra wrote:

               Hi all
                I am working on the fault-tolerance and the security
        aspects of
               Grid. I want to work on these two aspects in globus .
               For this I want to know about the internal structure of
        Globus
               and its components.


           Do you mean Globus or the job management component in Globus?


        I am consider about job management component only. Thanks for
        providing the Execution management links, that clears many of my
        doubts.



               Is Globus is using some Async or sync queues for taking
        request
               for executing the jobs?

           I'm not entirely sure what you mean by that. A client does
        not hold
           a connection
           open until a job finished processing completely, only until a job
           resource has
           been created on the server-side, which is used so that a
        client can
           refer
           to his job. Once the job resource has been created the
        processing starts
           on the server-side, without client-interaction.


        What here i want to ask is that, client requests the GRAM for
        the job execution. If there are few clients, then GRAM will
        handle the requests from all the clients. But what will happen
        if thousands of client try to approach the grid system for job
        execution.
        So i want to ask is that, is GRAM maintains some queues in which
        it pushes the requests if large number of them comes at one time.


    There is a queue (not in ws-gram, but in the underlying Java WS Core),
    that takes incoming requests for all services. Server threads pick
    requests
    from the queue and the desired service is called, and finally the
    response
    is sent back to the client.
    If the rate of incoming requests is higher than the rate of requests
    being
    processed, the queue fills up, and at a certain point there will be
    connection timeouts. My guess is that most servers work that way and
    that
    the problem of 'Denial of Service' is not yet solved.





               What if some node fails in between means is there some way to
               recover the pending task being stuck in the erronous node?

           What do you mean by "some node"? A client, the globus-server,
        or a
           machine of
           a cluster which was picked by the local resource manager
        (like PBS,
           Condor) to
           execute the job?



        Here by node i means:
        - GRAM itself fails after taking the request from the client.
        Then as far as i read, the grid will be dead.


    I'd rather say this GRAM instance is no longer available in the grid,
    and not the grid is dead.


        -The scheduling node fails, but it assigns jobs before going
        into the dead state. So my question is that, Is GRAM communicate
        with the execution nodes ( nodes that are actually assigned the
        execution of job) directly  or through the scheduler to get the
        current status of the job state. If it communicate through the
        scheduler, then there is problem.



    GRAM gets information about the jobs via the local resource manager
    (scheduler),
    it does not communicate with the execution nodes directly. More
    precisely, a
    program (the Scheduler Event Generator (SEG)) scans the logfile of
    the local
    resource manager and forwards the information about jobs to GRAM
    (GRAM4/ws-gram).
    For GRAM2/Pre-WS-GRAM there is also periodical polling using the job
    status
    query commands of the local resource manager.

    Why is communication via the local resource manager necessarily a
    problem?

    Martin




               Can anyone please guide me in getting answer to my
        questions and
               also some documents describing globus internal architecture.


           The following webpage gives an overview over key concepts
        gram (job
           management):

http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/key/#executionKey


           A starting point for more information about gram is this:
           http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/

           Martin


        Hey i am all new with Globus, so might be all these will be
        stupid questions for you, but i want to clear all doubts before
        actually starting working. Some leftovers that are still making
        me awake are:

        - If according to Job Lifetime limit, the job does not finish,
        GRAM will cancel the job. Is there any means with which we can
        resubmit the job  automatically? I guess we need to write some
        custom GRAM like component.

        - What is actual way of processing single job. I guess it should
        be distributed to different nodes rather than, the complete job
        executed on single node. So if that is case, how GRAM manages
        the response from different nodes and combine to reply to client?



               Also any suggestions related to my area of intrest i.e fault
               tolerance and security in globus. What is still pending
        that i
               can take and work upon.
                 Regards,
               Inderpreet


        Inderpreet
        Research Scholar
        TU, Patiala

Re: [gt-user] Globus Internal Architecture

Reply via email to