Dear Professor Mattamnn, Thanks a lot Professor Mattmann for the kind help, it is appreciated, sorry for getting back to you with my appreciation, I have been conducting tests with OODT based on your advice, but unfortunately I am having another problem....
I am following the steps (https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example) to get a sense of how to get workflow to work. The problem is that the File-Concatenator-PGE (by running the wmgr-client command-line) does not seems to be invoked or executed, but I am seeing the tasks are getting stacked up in the workflow manager with status either "RSUBMIT" or "QUEUED", but they are not getting executed, PFA: workflow_monitor.jpg, please note, by default the workflow min pool size is 6; so here comes another problem, i have 6 submitted tasks with status RSUBMIT, but any new incoming tasks will be forwarded to the waiting QUEUE with status "QUEUED"...please refer to the workflow_monitor.jpg for details, where I have 3 QUEUED workflow task and 6 RSUMBITE tasks. Question 1): not sure why the workflow is not being executed, and hanging at the state of "RSUBMIT", after enabling the log level, I am seeing the following entry in the log, not sure if this has anything to do with the "hanging" problem where workflow is not getting executed and hanging at state of "RSUBMIT". Oct 28, 2014 3:35:07 AM org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread safeCheckJobComplete WARNING: Exception checking completion status for job: [2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception: java.lang.NullPointerException Question 2): I think currently on my side any new incoming workflow task I am sending with the following command is being directed to the waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase this to a larger number though), ./wmgr-client --url http://localhost:9200 --operation --sendEvent --eventName fileconcatenator-pge --metaData --key RunID testNumber1 If possible, I would like to please know if there is a way we can purge the queue and get rid of those workflow tasks either in "RSUMBIT" and "QUEUED" I have already sent, please kindly help. Very sorry for troubling you with this, to be honest I find OODT a bit challenging to grasp within a short time frame, probably because there is no book like OODT in action like Solr.... and what I am doing is just trial and error blended with guess, but I don’t want to make a blind guess, it will be appreciated if you can please also shed some lights on where I can get more information logging or other way where I can troubleshoot. I think it might be worth tracking what is happening when workflow reach the status "RSUBMIT" and how to get a specific logging info specific to it... Again your advice and kind help will be appreciated usual. Thanks Luke > -----Original Message----- > From: Mattmann, Chris A (3980) [mailto:[email protected]] > Sent: 2014年10月26日 22:18 > To: Luke; 'Zichuan Wang' > Cc: 'Christian Alan Mattmann'; [email protected]; [email protected]; > [email protected] > Subject: Re: re: Question about OODT file manager > > Hi Luke, > > Thanks and sorry it’s taken me a while to reply. Here are some details below: > > > -----Original Message----- > From: Luke <[email protected]> > Date: Sunday, October 26, 2014 at 6:19 PM > To: Chris Mattmann <[email protected]>, 'Zichuan Wang' > <[email protected]> > Cc: Chris Mattmann <[email protected]>, "[email protected]" > <[email protected]>, "[email protected]" <[email protected]>, > "[email protected]" <[email protected]> > Subject: RE: re: Question about OODT file manager > > >Hi Professor Mattmann and OODT DEV, > > > >Sorry to trouble you with this email, our team has been struggling in > >the oodt to send json files to solr. > >One of the difficulties is still getting OODT workflow to call the > >poster.py in etllib. > > Sorry that you’re having difficulty let me try and help. > > > > >I am not sure if my understanding is correct with OODT requirement, I > >hope you can please kindly advice and help with our confusion. > > > >a set of goals in my mind with OODT is as follows, please kindly > >confirm and clarify: > > > >1) > >Get the File-Manager up and running. > > Yep, hopefully as installed via OODT RADIX. > > >2) > >send all json files with command wmgr-client to the fileManager server. > >(I believe we can achieve it with a bash script or probably python > >that calls the command line sequentially with each json file name as an > >argument?!) > > Suggestion: > > 1. Use the OODT crawler and file manager to crawl/index the JSON files (in > place data transfer). > 2. Take a look at CAS-PGE, it will help you write a workflow task that will > wrap > ETLlib and the poster command. > 3. Once you are confident with #2, whip up a script that pages through all of > your indexed JSON files, and then for each one, submits a workflow event (you > may need to look into aggregating them) that calls your CAS-PGE wrapped > poster task from ETLlib. > > >3) > >Once we have json files sent and stored in the File-Manager, we need to > >get workflow-manager up and running, and we can create a workflow that > >send those jsons file from the file manager to solr. > > See above. > > >4) > >Create a workflow according to > >Workflow2 User Guide > ><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide> > >>>>>>>>>>> here comes the problem….. > > I am not sure how to create a workflow task which can call the > >poster.py in python etllib, it looks like we need to create our own > >java class that extend <TaskInstance> which is an abstract Java class > >with one abstract method that has the following signature: > > > > > >protectedabstract ResultsState performExecution(ControlMetadata > >crtlMetadata); > > However, the detail of where to find the corresponding libs > >and where to put our implementation in workflow manager is being > >neglected in that page. I am not sure if we should use TaskInstance, > >but it seems the workflow has to have an interface thru which it can > >call the python code i.e. poster.py. and it looks like we need to > >embody the TaskInstance::performExecution by injecting the code that > >calls the poster.py and return the resultState. > > > > > >It would be greatly appreciated if you could please shed some lights > >and advice how we can get a task instance to call the poster.py. BTW, I > >am also not sure if my understanding is correct, please kindly correct > >it if inappropriate. Your help will be appreciated as usual. > > > > > > > >Thanks > >Luke > > Thanks Luke, see above. Let me know if it helps. > > Cheers! > > Chris > > > > >From: Mattmann, Chris A (3980) [mailto:[email protected]] > > > >Sent: 2014年10月25日 > > 13:34 > >To: Zichuan Wang > >Cc: Christian Alan Mattmann; Luke; [email protected]; [email protected] > >Subject: Re: 回复: Question about OODT file manager > > > > > > > >Please cc > >[email protected] <mailto:[email protected]> I will reply in detail > >soon > > > >Sent from my iPhone > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) NASA Jet > Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++ > Adjunct Associate Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++ > > > > > > > > > > > >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <[email protected]> wrote: > > > > > >Dear Professor, > > > > > > > >Could please also explain how I can crawl all JSON file name under a > >specific directory using CAS-PGE? I’ll work through this example > >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam > p > >le, but it doesn’t mention anything about crawling, instead it > >manually set the Input files paths... > > > > > > > > > >-- > > > >Zichuan Wang > > > >University of Southern California, Department of Computer Science > > > > > > > > > >在 2014年10月25日 星期六,下午12:10,Zichuan Wang > >写道: > > > >Dear Professor, > > > > > > > >In assignment 2 specification I noticed that you mentioned OODT File > >Manager, but from my understanding, we are using ETLLib poster which > >talks directly to Solr. So how can we use OODT File Manager in this > >assignment? > > > > > > > >-- > > > >Zichuan Wang > > > >University of Southern California, Department of Computer Science > > > > > > > > > > > > > > > > > > > > > > > > > >
