Re: [gt-user] Globus Internal Architecture

Inderpreet Chopra Thu, 15 Jan 2009 11:15:22 -0800

Thanks Martin for your help and support.

- According to my openion, communication through local scheduler can be
problem, if the scheduler dies before handing the jobs state to the GRAM. In
this way, the jobs will be executed, but the client will remain thinking
that the jobs are getting executed or are timed out.



- Another thing you mentioned is that GRAM instance is no longer available
in grid. That i am not able to understand. Can you please throw some more
detail regarding this.


-  Martin, Please correct me if i am wrong in following flow: (this can be
most generic flow for jobs submission to completion)
Java WS Core --> GRAM ---> Scheduler----> Job Execution nodes

So here WS Core can provide the interface or a way for the client to submit
the job.
GRAM, helps in submitting, monitoring and canceling the jobs.
Scheduler (PBS, Condor, etc) to schedule the jobs to different nodes.

- There is one component called MDS. Where that fits into the above flow.

- I want to start implementing my ideas, to further improve the globus
processes. So, How to set the devlopment enviornment under my eclipse. Is
there some tutorial related to that. I have read, Programmers guide for
globus, but find that it gives us only information necessary to build the
webservice.
While studying, i found that there is eclipse plugin GDTE . But i am not
sure that will i be able to build new component on the existing globus
component (esp. over GRAM) using this.

What will be the correct way to start toying with the existing components.


Thanks & Regards
Inderpreet
Research Scholar
TU, Patiala


On Thu, Jan 15, 2009 at 4:48 AM, Martin Feller <[email protected]> wrote:

> Inderpreet Chopra wrote:
>
>> Thanks Martin.
>> Please see my comments inline.
>>
>> On Wed, Jan 14, 2009 at 2:49 AM, Martin Feller <[email protected]<mailto:
>> [email protected]>> wrote:
>>
>>    Inderpreet Chopra wrote:
>>
>>        Hi all
>>         I am working on the fault-tolerance and the security aspects of
>>        Grid. I want to work on these two aspects in globus .
>>        For this I want to know about the internal structure of Globus
>>        and its components.
>>
>>
>>    Do you mean Globus or the job management component in Globus?
>>
>>
>> I am consider about job management component only. Thanks for providing
>> the Execution management links, that clears many of my doubts.
>>
>>
>>
>>        Is Globus is using some Async or sync queues for taking request
>>        for executing the jobs?
>>
>>
>>    I'm not entirely sure what you mean by that. A client does not hold
>>    a connection
>>    open until a job finished processing completely, only until a job
>>    resource has
>>    been created on the server-side, which is used so that a client can
>>    refer
>>    to his job. Once the job resource has been created the processing
>> starts
>>    on the server-side, without client-interaction.
>>
>>
>> What here i want to ask is that, client requests the GRAM for the job
>> execution. If there are few clients, then GRAM will handle the requests from
>> all the clients. But what will happen if thousands of client try to approach
>> the grid system for job execution.
>> So i want to ask is that, is GRAM maintains some queues in which it pushes
>> the requests if large number of them comes at one time.
>>
>>
>
> There is a queue (not in ws-gram, but in the underlying Java WS Core),
> that takes incoming requests for all services. Server threads pick requests
> from the queue and the desired service is called, and finally the response
> is sent back to the client.
> If the rate of incoming requests is higher than the rate of requests being
> processed, the queue fills up, and at a certain point there will be
> connection timeouts. My guess is that most servers work that way and that
> the problem of 'Denial of Service' is not yet solved.
>
>
>>
>>
>>        What if some node fails in between means is there some way to
>>        recover the pending task being stuck in the erronous node?
>>
>>
>>    What do you mean by "some node"? A client, the globus-server, or a
>>    machine of
>>    a cluster which was picked by the local resource manager (like PBS,
>>    Condor) to
>>    execute the job?
>>
>>
>>
>> Here by node i means:
>> - GRAM itself fails after taking the request from the client. Then as far
>> as i read, the grid will be dead.
>>
>
> I'd rather say this GRAM instance is no longer available in the grid,
> and not the grid is dead.
>
>  -The scheduling node fails, but it assigns jobs before going into the dead
>> state. So my question is that, Is GRAM communicate with the execution nodes
>> ( nodes that are actually assigned the execution of job) directly  or
>> through the scheduler to get the current status of the job state. If it
>> communicate through the scheduler, then there is problem.
>>
>>
>>
> GRAM gets information about the jobs via the local resource manager
> (scheduler),
> it does not communicate with the execution nodes directly. More precisely,
> a
> program (the Scheduler Event Generator (SEG)) scans the logfile of the
> local
> resource manager and forwards the information about jobs to GRAM
> (GRAM4/ws-gram).
> For GRAM2/Pre-WS-GRAM there is also periodical polling using the job status
> query commands of the local resource manager.
>
> Why is communication via the local resource manager necessarily a problem?
>
> Martin
>
>
>
>>
>>        Can anyone please guide me in getting answer to my questions and
>>        also some documents describing globus internal architecture.
>>
>>
>>    The following webpage gives an overview over key concepts gram (job
>>    management):
>>
>> http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/key/#executionKey
>>
>>    A starting point for more information about gram is this:
>>    http://www.globus.org/toolkit/docs/4.2/4.2.1/execution/
>>
>>    Martin
>>
>>
>> Hey i am all new with Globus, so might be all these will be stupid
>> questions for you, but i want to clear all doubts before actually starting
>> working. Some leftovers that are still making me awake are:
>>
>> - If according to Job Lifetime limit, the job does not finish, GRAM will
>> cancel the job. Is there any means with which we can resubmit the job
>>  automatically? I guess we need to write some custom GRAM like component.
>>
>> - What is actual way of processing single job. I guess it should be
>> distributed to different nodes rather than, the complete job executed on
>> single node. So if that is case, how GRAM manages the response from
>> different nodes and combine to reply to client?
>>
>>
>>
>>        Also any suggestions related to my area of intrest i.e fault
>>        tolerance and security in globus. What is still pending that i
>>        can take and work upon.
>>          Regards,
>>        Inderpreet
>>
>>
>>
>> Inderpreet
>> Research Scholar
>> TU, Patiala
>>
>
>

Re: [gt-user] Globus Internal Architecture

Reply via email to