Hey Sanjaya,

Easy, see the attached PGEConfig.xml here:

http://paste.apache.org/6OGW

In that file:

1. We compute the staged file path by computing JobDir
2. We create in the exe block a staged input dir
3. We stage the files just using cps in the exeBlock (could have
just as easily used fileStager)
4. We know that the file is [JobInputDir]/[Filename]

HTH.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Sanjaya Medonsa <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, June 14, 2013 5:02 AM
To: Airavata Dev <[email protected]>
Subject: Re: Apache Airavata-OODT Integration

>Thanks Chris for your input. I actually use the PGETaskInstance for file
>staging with minimal additional code. But my issue issue not with the file
>staging. As per my current implementation, application inputs product id.
>Then using the capabilities in PGETaskInstance class, it does the file
>staging. But my issue is that during the file staging product is mapped to
>a file in specified working directory. I don't have a way to retrieve the
>staged file name, as it is not recorded in Metadata (For this purpose, I
>query the FileManager again to get the corresponding reference name for a
>given product id). I need the staged file path, since I modify the input
>product id into staged file path prior to actual workflow invocation.
>Basically I am looking for some implementation where I can easily
>retrieve,
>staged file path for a given product id.
>
>Cheers,
>Sanjaya
>
>
>On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) <
>[email protected]> wrote:
>
>> Hi Sanjaya,
>>
>> -----Original Message-----
>>
>> From: Sanjaya Medonsa <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>> Date: Monday, June 10, 2013 5:20 PM
>> To: "[email protected]" <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Subject: Re: Apache Airavata-OODT Integration
>>
>> >Hi Chris,
>> >       On configuration, I have get rid of all the configuration files,
>> >including pge-config.xml. All the required configurations are
>> >programmatically set.  Configurations such FileManagerServer URL are
>> >configured in the airavata-server.properties file. I'll update the
>>review
>> >request with modified details.
>>
>> Great work!
>>
>>
>> >       Still I am not quite clear on how to retrieve staged file path
>> >properly. Currently I am using getStagedFilePath method
>> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file
>>path.
>> >While I am going through the OODT code that I have seen method in
>> >DataTransferer to notify FileManagerServer once transfer is completed.
>>But
>> >I couldn't see the same for product retrieval.
>>
>> Example:
>> 
>>http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-con
>>fi
>> g.xml
>>
>>
>> Review Board tickets:
>> https://reviews.apache.org/r/4746/
>>
>> https://reviews.apache.org/r/5382/
>>
>>
>> JIRA issue source (in OODT since 0.4):
>>   https://issues.apache.org/jira/browse/OODT-443
>>
>>
>> >       As you suggested I'll improve my workflow using Apache Tika. I'd
>> >like to continue this as an Parallal task. While modifying staging
>> >implementation based on community feedback, currently I am looking at
>> >ingesting output back to OODT.
>>
>> See above for info on file staging. I would strongly encourage you not
>> to reimplement CAS-PGE in Airavata -- it's pretty functional and
>>expressive
>> anyways and I would work to figure out how to make Airavata leverage
>> CAS-PGE.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [email protected]
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>> >
>> >
>> >
>> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) <
>> >[email protected]> wrote:
>> >
>> >> Hi Sanjaya,
>> >>
>> >> I think starting out with /bin/ls would be good, maybe like a /bin/ls
>> >> workflow, and then for each file returned, maybe run Apache Tika and
>> >> extract its metadata and then pipe that to a file?
>> >>
>> >> How about that?
>> >>
>> >> Cheers,
>> >> Chris
>> >>
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Chris Mattmann, Ph.D.
>> >> Senior Computer Scientist
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 171-266B, Mailstop: 171-246
>> >> Email: [email protected]
>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Adjunct Assistant Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Sanjaya Medonsa <[email protected]>
>> >> Reply-To: "[email protected]" <[email protected]>
>> >> Date: Tuesday, June 4, 2013 5:31 AM
>> >> To: "[email protected]" <[email protected]>
>> >> Cc: "[email protected]" <[email protected]>
>> >> Subject: Re: Apache Airavata-OODT Integration
>> >>
>> >> >Hi Chris,
>> >> >     Please see my comments below on the two items.
>> >> >
>> >> >Configuration : It should be possible to set them programmatically.
>> >> >Actually I have implemented partly it for file staging information.
>> >>I'll
>> >> >work to get rid of the other configuration files.
>> >> >
>> >> >Staged File Path : I'll work on the suggested approach, though I am
>>not
>> >> >fully understand it at the moment. I guess I need to go through bit
>> >>more
>> >> >on
>> >> >CAS-PGE and come back to you on the proposed approach.
>> >> >
>> >> >Currently I am testing this by wrapping /bin/ls command as GFac
>> >>service. I
>> >> >may need to test this with real workflow. Could you please provide
>>me
>> >>know
>> >> >some guidance on better scenario to test this.
>> >> >
>> >> >Cheers,
>> >> >Sanjaya
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) <
>> >> >[email protected]> wrote:
>> >> >
>> >> >> Hi Sanjaya,
>> >> >>
>> >> >> -----Original Message-----
>> >> >>
>> >> >> From: Sanjaya Medonsa <[email protected]>
>> >> >> Reply-To: "[email protected]" <[email protected]>
>> >> >> Date: Thursday, May 30, 2013 5:12 AM
>> >> >> To: "[email protected]" <[email protected]>,
>> >> >>"[email protected]"
>> >> >> <[email protected]>
>> >> >> Subject: Apache Airavata-OODT Integration
>> >> >>
>> >> >> >Hi,
>> >> >> >     I have worked on the Apache Airavata integration with Apache
>> >> >>OODT. As
>> >> >> >a first step, I have implemented integration with Apache OODT
>>file
>> >> >> >manager component.
>> >> >>
>> >> >> Great work!!
>> >> >>
>> >> >> Comments below:
>> >> >>
>> >> >> >      1. Introduce a new GFac Schema type called OODTProduct
>>which
>> >> >>takes
>> >> >> >APache OODT product IDs as input.
>> >> >> >      2. Implemented new pre GFac Handler by extending Apache
>>OODT
>> >> >> >PgeTaskInstance to stage the corresponding file into the working
>> >> >> >directory.
>> >> >> >      3. Once file is staged, input parameter with OODT product
>>id
>> >>is
>> >> >> >replaced with path of the staged file for downstream processing
>> >> >> >
>> >> >> >I have tested the implementation with Gfac application which
>>wraps
>> >> >>/bin/ls
>> >> >> >command. Application takes product id as input and stage
>> >>corresponding
>> >> >> >file
>> >> >> >into the working directory and /bin/ls is executed against the
>> >>staged
>> >> >> >file.
>> >> >> >Hope this is a valid testing scenario.
>> >> >> >
>> >> >> >Concerns
>> >> >> >- Configurations : I have added new configuration file named and
>> >> >> >oodt-integration.properties in addition to dynamic_metadata.met
>>and
>> >> >> >pge-config.xml files used by OODT. But at the moment there is no
>> >>item
>> >> >> >configured with the oodt-integration.properties.
>> >> >>
>> >> >> You probably only need the pge-config.xml file. Dynamic metadata,
>>and
>> >> >>the
>> >> >> task configuration properties can be specified programmatically,
>> >>right?
>> >> >>
>> >> >> >- Staged File Name - With the current implementation of
>> >> >>PgeTaskInstance it
>> >> >> >is not possible to retrieve path of the staged file. Due to this
>> >> >> >limitation, I have query the FileManagerServer with product id
>>and
>> >> >> >retrieve
>> >> >> >the file name and computed the file path using information of
>> >>working
>> >> >> >directory.
>> >> >>
>> >> >> I'm not sure I understand this? If you store and record the
>>Filename,
>> >> >>and
>> >> >> FileLocation
>> >> >> metadata files, then you can easily retrieve the staged file path
>> >>via a
>> >> >> SQLquery
>> >> >> via CAS-PGE by simply setting the
>>FORMAT=('$FileLocation/$Filename')
>> >>in
>> >> >> the response.
>> >> >> Can you comment on this?
>> >> >>
>> >> >> >- Currently it is not possible to execute the workflow using
>>Xbaya
>> >>due
>> >> >>to
>> >> >> >validation failure due to new schema type. I have commented out
>>the
>> >> >> >relevant validation code for testing purpose.
>> >> >>
>> >> >> OK, will probably need to work on this.
>> >> >>
>> >> >> >
>> >> >> >Currently I am having an issue with review board client tool and
>> >>need
>> >> >>to
>> >> >> >resolve it to upload the code for review.
>> >> >>
>> >> >> I see later that you got this working, so will head over and
>>review
>> >>that
>> >> >> now.
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> Cheers,
>> >> >> Chris
>> >> >>
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Chris Mattmann, Ph.D.
>> >> >> Senior Computer Scientist
>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> >> Office: 171-266B, Mailstop: 171-246
>> >> >> Email: [email protected]
>> >> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >> Adjunct Assistant Professor, Computer Science Department
>> >> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>

Reply via email to