Dougal--I am not sure what the GT4 equivalent is for file stage-in but I know that the GT2 stagein does detect that the same file has previously been staged in, and not stage it in again. The cache on the far end has many hard links to the same file.
Steve On Mon, 31 May 2010, Dougal Ballantyne wrote:
Dear GT, I have been working on a project for several months now researching and developing a grid solution based on Globus Toolkit 4. Many thanks to people who have helped me with previous issues. I have a slightly Off-Topic question related to how others handle a particular scenario. We have a job generation and control application that we have added support for Globus through some perl modules that call globusrun-ws. When a job is generated, the program pulls from the job database the associated input files and creates an XML file which lists the input files in StageIn and the requested results file in StageOut. This works great for a single job and jobs that all use different input data. However we often have a scenario when we generate several hundred jobs that all use the same input data. In our current setup we would StageIn the same input file several hundred times. I was wondering if that was a method or known best practice within the Globus Toolkit for handling this sort of scenario. I am aware that we could modify the tool to stage the data first, run the jobs and then remove the input file BUT that would also be a change of workflow for the users. Your thoughts or comments greatly appreciated. Kind regards, Dougal Ballantyne
-- ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 [email protected] http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
