Dougal--I am not sure what the GT4 equivalent is for file
stage-in but I know that the GT2 stagein does detect that the
same file has previously been staged in, and not stage it in again.
The cache on the far end has many hard links to the same file.

Steve


On Mon, 31 May 2010, Dougal Ballantyne wrote:

Dear GT,

I have been working on a project for several months now researching
and developing a grid solution based on Globus Toolkit 4. Many thanks
to people who have helped me with previous issues.

I have a slightly Off-Topic question related to how others handle a
particular scenario.

We have a job generation and control application that we have added
support for Globus through some perl modules that call globusrun-ws.
When a job is generated, the program pulls from the job database the
associated input files and creates an XML file which lists the input
files in StageIn and the requested results file in StageOut. This
works great for a single job and jobs that all use different input
data. However we often have a scenario when we generate several
hundred jobs that all use the same input data. In our current setup we
would StageIn the same input file several hundred times.

I was wondering if that was a method or known best practice within the
Globus Toolkit for handling this sort of scenario. I am aware that we
could modify the tool to stage the data first, run the jobs and then
remove the input file BUT that would also be a change of workflow for
the users.

Your thoughts or comments greatly appreciated.

Kind regards,

Dougal Ballantyne


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
[email protected]  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

Reply via email to