Re: [gt-user] [Slightly OT] Handling repeated input files

Dougal Ballantyne Tue, 01 Jun 2010 14:10:55 -0700

Adam,

Thank you for the very helpful information. Your thesis is an
informative piece of work. I have hit several issues you have also
addressed.


I will investigate further the use of RLS as an index of file hashes
that are present in the grid data cache.

It still has the challenge of what to do when you submit 100 at the
same time vs 1 + 99 jobs. (Something I think we can probably just
adopt as a best practice.)

Kind regards,

Dougal


On Tue, Jun 1, 2010 at 12:37 AM, Adam Bazinet <[email protected]> wrote:
> We faced a similar problem with our system.  We used RLS to implement a file
> caching scheme that in essence, checks the RLS database to see if the file
> already exists on the remote resource - if it does, we don't stage it in.
> For more details, see my master's thesis linked from here:
> http://www.cbcb.umd.edu/~pknut777/
>
> regards,
> Adam
>
> On Mon, May 31, 2010 at 4:04 PM, Dougal Ballantyne <[email protected]>
> wrote:
>>
>> Dear GT,
>>
>> I have been working on a project for several months now researching
>> and developing a grid solution based on Globus Toolkit 4. Many thanks
>> to people who have helped me with previous issues.
>>
>> I have a slightly Off-Topic question related to how others handle a
>> particular scenario.
>>
>> We have a job generation and control application that we have added
>> support for Globus through some perl modules that call globusrun-ws.
>> When a job is generated, the program pulls from the job database the
>> associated input files and creates an XML file which lists the input
>> files in StageIn and the requested results file in StageOut. This
>> works great for a single job and jobs that all use different input
>> data. However we often have a scenario when we generate several
>> hundred jobs that all use the same input data. In our current setup we
>> would StageIn the same input file several hundred times.
>>
>> I was wondering if that was a method or known best practice within the
>> Globus Toolkit for handling this sort of scenario. I am aware that we
>> could modify the tool to stage the data first, run the jobs and then
>> remove the input file BUT that would also be a change of workflow for
>> the users.
>>
>> Your thoughts or comments greatly appreciated.
>>
>> Kind regards,
>>
>> Dougal Ballantyne
>
>

Re: [gt-user] [Slightly OT] Handling repeated input files

Reply via email to