Hi Luke, Thanks and sorry it’s taken me a while to reply. Here are some details below:
-----Original Message----- From: Luke <[email protected]> Date: Sunday, October 26, 2014 at 6:19 PM To: Chris Mattmann <[email protected]>, 'Zichuan Wang' <[email protected]> Cc: Chris Mattmann <[email protected]>, "[email protected]" <[email protected]>, "[email protected]" <[email protected]>, "[email protected]" <[email protected]> Subject: RE: re: Question about OODT file manager >Hi Professor Mattmann and OODT DEV, > >Sorry to trouble you with this email, our team has been struggling in the >oodt to send json files to solr. >One of the difficulties is still getting OODT workflow to call the >poster.py in etllib. Sorry that you’re having difficulty let me try and help. > >I am not sure if my understanding is correct with OODT requirement, I >hope you can please kindly advice and help with our confusion. > >a set of goals in my mind with OODT is as follows, please kindly confirm >and clarify: > >1) >Get the File-Manager up and running. Yep, hopefully as installed via OODT RADIX. >2) >send all json files with command wmgr-client to the fileManager server. >(I believe we can achieve it with a bash script or probably > python that calls the command line sequentially with each json file name >as an argument?!) Suggestion: 1. Use the OODT crawler and file manager to crawl/index the JSON files (in place data transfer). 2. Take a look at CAS-PGE, it will help you write a workflow task that will wrap ETLlib and the poster command. 3. Once you are confident with #2, whip up a script that pages through all of your indexed JSON files, and then for each one, submits a workflow event (you may need to look into aggregating them) that calls your CAS-PGE wrapped poster task from ETLlib. >3) >Once we have json files sent and stored in the File-Manager, we need to >get workflow-manager up and running, and we can create a workflow > that send those jsons file from the file manager to solr. See above. >4) >Create a workflow according to >Workflow2 User Guide ><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide> >>>>>>>>>>> here comes the problem….. > I am not sure how to create a workflow task which can call the >poster.py in python etllib, it looks like we need to create our own java > class that extend <TaskInstance> which is an abstract Java class with >one abstract method that has the following signature: > > >protectedabstract ResultsState performExecution(ControlMetadata >crtlMetadata); > However, the detail of where to find the corresponding libs and >where to put our implementation in workflow manager is being neglected > in that page. I am not sure if we should use TaskInstance, but it seems >the workflow has to have an interface thru which it can call the python >code i.e. poster.py. and it looks like we need to embody the >TaskInstance::performExecution by injecting the code > that calls the poster.py and return the resultState. > > >It would be greatly appreciated if you could please shed some lights and >advice how we can get a task instance to call the poster.py. BTW, I am > also not sure if my understanding is correct, please kindly correct it >if inappropriate. Your help will be appreciated as usual. > > > >Thanks >Luke Thanks Luke, see above. Let me know if it helps. Cheers! Chris > >From: Mattmann, Chris A (3980) [mailto:[email protected]] > >Sent: 2014年10月25日 > 13:34 >To: Zichuan Wang >Cc: Christian Alan Mattmann; Luke; [email protected]; [email protected] >Subject: Re: 回复: Question about OODT file manager > > > >Please cc >[email protected] <mailto:[email protected]> I will reply in detail >soon > >Sent from my iPhone ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <[email protected]> wrote: > > >Dear Professor, > > > >Could please also explain how I can crawl all JSON file name under a >specific directory using CAS-PGE? I’ll work through this example >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example, > but it doesn’t mention anything about crawling, instead it manually set >the Input files paths... > > > > >-- > >Zichuan Wang > >University of Southern California, Department of Computer Science > > > > >在 2014年10月25日 星期六,下午12:10,Zichuan Wang >写道: > >Dear Professor, > > > >In assignment 2 specification I noticed that you mentioned OODT File >Manager, but from my understanding, we are using ETLLib poster which >talks directly to Solr. So how can we use OODT File Manager in this >assignment? > > > >-- > >Zichuan Wang > >University of Southern California, Department of Computer Science > > > > > > > > > > > > >
