Re: [gt-user] GT4 job management

txcom2003 Fri, 20 Jun 2008 02:29:44 -0700

> On Jun 19, 2008, at Jun 19, 6:40 PM, [EMAIL PROTECTED]
> wrote:
>> Hi,
>> Thanks for your reply.
>>
>> I Think there are some places to store each of job state
>> information. But
>> where and how that information can be stored or retrieved ?


>
> Yes.  At various times the job state information is written to and
> read from flat files.  We're looking into alternatively using a
> database to store the information.  Possibly coming sometime in the
> 4.2 series.
>

So, where that information be stored in Globus Toolkit 4.0.7. And how
Globus manage that information to prevent some failure that i had mention
before ?

Thanks

Tonny

>>
>>> Hi Tonny,
>>>
>>> GRAM is fault tolerant, meaning that when/if the container or service
>>> host crashes, the job details are not lost.  When the GRAM4 service
>>> is
>>> restarted, then the processing/monitoring of the job resumes.  GRAM2
>>> requires user/client intervention to restart the processing of the
>>> job.
>>>
>>> If the job included file stage in directives and those had not
>>> completed at the time of the crash, then gram will resume processing
>>> the job for that job state and continue until the job has been fully
>>> processed.
>>>
>>> If the job had already been submitted to the local resource manager,
>>> then GRAM will resume monitoring the job in the LRM and continue
>>> processing the job to completion.  GRAM persists the LRM job id.  If
>>> the crash included the LRM and the LRM is also fault tolerant and
>>> resumes processing of the job, then the job will be completely
>>> processed without requiring any client intervention.
>>>
>>> A persistent connection between the GRAM client and service is not
>>> maintained, so network failures between the client and service can be
>>> overcome.
>>>
>>> In GRAM4 (WS GRAM), an EPR is included in the reply to
>>> createManagedJob.  This allows the client to contact the service when
>>> desired to get the current job status, cancel the job, subscribe for
>>> notifications, ...
>>>
>>> If the createManagedJob call is received by the GRAM service, but the
>>> reply (containing the EPR) is not received by the client (possibly
>>> due
>>> to network failure), then GRAM4 provides the means to subsequently
>>> get
>>> the EPR in order to control the previously submitted job.
>>> Detail about that are here:
>>> http://www-unix.globus.org/toolkit/docs/4.0/execution/wsgram/user-index.html#s-wsgram-user-submissionid
>>>
>>> Lemme know if you have any more questions on this.
>>>
>>> Regards,
>>> -Stu
>>>
>>> On Jun 19, 2008, at Jun 19, 10:00 AM, [EMAIL PROTECTED]
>>> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm not quite understand about how GT4 manages job that was
>>>> submitted when
>>>> some failures happen, for example lost connection with client that
>>>> caused
>>>> by temporary network failure and lost contact with LRM that caused
>>>> by
>>>> globus being restarted during job execution.
>>>>
>>>> does anybody know about this ?
>>>>
>>>> Regards
>>>>
>>>> Tonny
>>>>
>>>
>>>
>>
>>
>
>

Re: [gt-user] GT4 job management

Reply via email to