Hey Sanjaya, Easy, see the attached PGEConfig.xml here:
http://paste.apache.org/6OGW In that file: 1. We compute the staged file path by computing JobDir 2. We create in the exe block a staged input dir 3. We stage the files just using cps in the exeBlock (could have just as easily used fileStager) 4. We know that the file is [JobInputDir]/[Filename] HTH. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Sanjaya Medonsa <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Friday, June 14, 2013 5:02 AM To: Airavata Dev <[email protected]> Subject: Re: Apache Airavata-OODT Integration >Thanks Chris for your input. I actually use the PGETaskInstance for file >staging with minimal additional code. But my issue issue not with the file >staging. As per my current implementation, application inputs product id. >Then using the capabilities in PGETaskInstance class, it does the file >staging. But my issue is that during the file staging product is mapped to >a file in specified working directory. I don't have a way to retrieve the >staged file name, as it is not recorded in Metadata (For this purpose, I >query the FileManager again to get the corresponding reference name for a >given product id). I need the staged file path, since I modify the input >product id into staged file path prior to actual workflow invocation. >Basically I am looking for some implementation where I can easily >retrieve, >staged file path for a given product id. > >Cheers, >Sanjaya > > >On Wed, Jun 12, 2013 at 10:04 PM, Mattmann, Chris A (398J) < >[email protected]> wrote: > >> Hi Sanjaya, >> >> -----Original Message----- >> >> From: Sanjaya Medonsa <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Monday, June 10, 2013 5:20 PM >> To: "[email protected]" <[email protected]> >> Cc: "[email protected]" <[email protected]> >> Subject: Re: Apache Airavata-OODT Integration >> >> >Hi Chris, >> > On configuration, I have get rid of all the configuration files, >> >including pge-config.xml. All the required configurations are >> >programmatically set. Configurations such FileManagerServer URL are >> >configured in the airavata-server.properties file. I'll update the >>review >> >request with modified details. >> >> Great work! >> >> >> > Still I am not quite clear on how to retrieve staged file path >> >properly. Currently I am using getStagedFilePath method >> >in ApacheAiravataWorkFlowInstanceImpl to regenerate the staged file >>path. >> >While I am going through the OODT code that I have seen method in >> >DataTransferer to notify FileManagerServer once transfer is completed. >>But >> >I couldn't see the same for product retrieval. >> >> Example: >> >>http://svn.apache.org/repos/asf/oodt/trunk/pge/src/test/resources/pge-con >>fi >> g.xml >> >> >> Review Board tickets: >> https://reviews.apache.org/r/4746/ >> >> https://reviews.apache.org/r/5382/ >> >> >> JIRA issue source (in OODT since 0.4): >> https://issues.apache.org/jira/browse/OODT-443 >> >> >> > As you suggested I'll improve my workflow using Apache Tika. I'd >> >like to continue this as an Parallal task. While modifying staging >> >implementation based on community feedback, currently I am looking at >> >ingesting output back to OODT. >> >> See above for info on file staging. I would strongly encourage you not >> to reimplement CAS-PGE in Airavata -- it's pretty functional and >>expressive >> anyways and I would work to figure out how to make Airavata leverage >> CAS-PGE. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> > >> > >> > >> >On Wed, Jun 5, 2013 at 12:11 AM, Mattmann, Chris A (398J) < >> >[email protected]> wrote: >> > >> >> Hi Sanjaya, >> >> >> >> I think starting out with /bin/ls would be good, maybe like a /bin/ls >> >> workflow, and then for each file returned, maybe run Apache Tika and >> >> extract its metadata and then pipe that to a file? >> >> >> >> How about that? >> >> >> >> Cheers, >> >> Chris >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Senior Computer Scientist >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 171-266B, Mailstop: 171-246 >> >> Email: [email protected] >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Assistant Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> From: Sanjaya Medonsa <[email protected]> >> >> Reply-To: "[email protected]" <[email protected]> >> >> Date: Tuesday, June 4, 2013 5:31 AM >> >> To: "[email protected]" <[email protected]> >> >> Cc: "[email protected]" <[email protected]> >> >> Subject: Re: Apache Airavata-OODT Integration >> >> >> >> >Hi Chris, >> >> > Please see my comments below on the two items. >> >> > >> >> >Configuration : It should be possible to set them programmatically. >> >> >Actually I have implemented partly it for file staging information. >> >>I'll >> >> >work to get rid of the other configuration files. >> >> > >> >> >Staged File Path : I'll work on the suggested approach, though I am >>not >> >> >fully understand it at the moment. I guess I need to go through bit >> >>more >> >> >on >> >> >CAS-PGE and come back to you on the proposed approach. >> >> > >> >> >Currently I am testing this by wrapping /bin/ls command as GFac >> >>service. I >> >> >may need to test this with real workflow. Could you please provide >>me >> >>know >> >> >some guidance on better scenario to test this. >> >> > >> >> >Cheers, >> >> >Sanjaya >> >> > >> >> > >> >> > >> >> > >> >> >On Mon, Jun 3, 2013 at 8:17 PM, Mattmann, Chris A (398J) < >> >> >[email protected]> wrote: >> >> > >> >> >> Hi Sanjaya, >> >> >> >> >> >> -----Original Message----- >> >> >> >> >> >> From: Sanjaya Medonsa <[email protected]> >> >> >> Reply-To: "[email protected]" <[email protected]> >> >> >> Date: Thursday, May 30, 2013 5:12 AM >> >> >> To: "[email protected]" <[email protected]>, >> >> >>"[email protected]" >> >> >> <[email protected]> >> >> >> Subject: Apache Airavata-OODT Integration >> >> >> >> >> >> >Hi, >> >> >> > I have worked on the Apache Airavata integration with Apache >> >> >>OODT. As >> >> >> >a first step, I have implemented integration with Apache OODT >>file >> >> >> >manager component. >> >> >> >> >> >> Great work!! >> >> >> >> >> >> Comments below: >> >> >> >> >> >> > 1. Introduce a new GFac Schema type called OODTProduct >>which >> >> >>takes >> >> >> >APache OODT product IDs as input. >> >> >> > 2. Implemented new pre GFac Handler by extending Apache >>OODT >> >> >> >PgeTaskInstance to stage the corresponding file into the working >> >> >> >directory. >> >> >> > 3. Once file is staged, input parameter with OODT product >>id >> >>is >> >> >> >replaced with path of the staged file for downstream processing >> >> >> > >> >> >> >I have tested the implementation with Gfac application which >>wraps >> >> >>/bin/ls >> >> >> >command. Application takes product id as input and stage >> >>corresponding >> >> >> >file >> >> >> >into the working directory and /bin/ls is executed against the >> >>staged >> >> >> >file. >> >> >> >Hope this is a valid testing scenario. >> >> >> > >> >> >> >Concerns >> >> >> >- Configurations : I have added new configuration file named and >> >> >> >oodt-integration.properties in addition to dynamic_metadata.met >>and >> >> >> >pge-config.xml files used by OODT. But at the moment there is no >> >>item >> >> >> >configured with the oodt-integration.properties. >> >> >> >> >> >> You probably only need the pge-config.xml file. Dynamic metadata, >>and >> >> >>the >> >> >> task configuration properties can be specified programmatically, >> >>right? >> >> >> >> >> >> >- Staged File Name - With the current implementation of >> >> >>PgeTaskInstance it >> >> >> >is not possible to retrieve path of the staged file. Due to this >> >> >> >limitation, I have query the FileManagerServer with product id >>and >> >> >> >retrieve >> >> >> >the file name and computed the file path using information of >> >>working >> >> >> >directory. >> >> >> >> >> >> I'm not sure I understand this? If you store and record the >>Filename, >> >> >>and >> >> >> FileLocation >> >> >> metadata files, then you can easily retrieve the staged file path >> >>via a >> >> >> SQLquery >> >> >> via CAS-PGE by simply setting the >>FORMAT=('$FileLocation/$Filename') >> >>in >> >> >> the response. >> >> >> Can you comment on this? >> >> >> >> >> >> >- Currently it is not possible to execute the workflow using >>Xbaya >> >>due >> >> >>to >> >> >> >validation failure due to new schema type. I have commented out >>the >> >> >> >relevant validation code for testing purpose. >> >> >> >> >> >> OK, will probably need to work on this. >> >> >> >> >> >> > >> >> >> >Currently I am having an issue with review board client tool and >> >>need >> >> >>to >> >> >> >resolve it to upload the code for review. >> >> >> >> >> >> I see later that you got this working, so will head over and >>review >> >>that >> >> >> now. >> >> >> >> >> >> Thanks! >> >> >> >> >> >> Cheers, >> >> >> Chris >> >> >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> Chris Mattmann, Ph.D. >> >> >> Senior Computer Scientist >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> >> Office: 171-266B, Mailstop: 171-246 >> >> >> Email: [email protected] >> >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> Adjunct Assistant Professor, Computer Science Department >> >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>
