Dear Professor, We are stuck in OODT. The most critical problem we have now is
“How to make crawler work with workflow”? -- Zichuan Wang University of Southern California, Department of Computer Science 在 2014年10月28日 星期二,下午12:52,Luke 写道: > Dear Professor Mattamnn, > Thanks a lot Professor Mattmann for the kind help, it is appreciated, sorry > for getting back to you with my appreciation, I have been conducting tests > with OODT based on your advice, but unfortunately I am having another > problem.... > > I am following the steps > (https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example) > to get a sense of how to get workflow to work. > The problem is that the File-Concatenator-PGE (by running the wmgr-client > command-line) does not seems to be invoked or executed, but I am seeing the > tasks are getting stacked up in the workflow manager with status either > "RSUBMIT" or "QUEUED", but they are not getting executed, PFA: > workflow_monitor.jpg, please note, by default the workflow min pool size is > 6; so here comes another problem, i have 6 submitted tasks with status > RSUBMIT, but any new incoming tasks will be forwarded to the waiting QUEUE > with status "QUEUED"...please refer to the workflow_monitor.jpg for details, > where I have 3 QUEUED workflow task and 6 RSUMBITE tasks. > > Question 1): not sure why the workflow is not being executed, and hanging at > the state of "RSUBMIT", after enabling the log level, I am seeing the > following entry in the log, not sure if this has anything to do with the > "hanging" problem where workflow is not getting executed and hanging at state > of "RSUBMIT". > Oct 28, 2014 3:35:07 AM > org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > safeCheckJobComplete > WARNING: Exception checking completion status for job: > [2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception: > java.lang.NullPointerException > > Question 2): I think currently on my side any new incoming workflow task I am > sending with the following command is being directed to the waiting "QUEUE" > because of the min pool size (i.e. 6) (I can increase this to a larger number > though), > ./wmgr-client --url http://localhost:9200 --operation --sendEvent --eventName > fileconcatenator-pge --metaData --key RunID testNumber1 > If possible, I would like to please know if there is a way we can purge the > queue and get rid of those workflow tasks either in "RSUMBIT" and "QUEUED" I > have already sent, please kindly help. > > Very sorry for troubling you with this, to be honest I find OODT a bit > challenging to grasp within a short time frame, probably because there is no > book like OODT in action like Solr.... and what I am doing is just trial and > error blended with guess, but I don’t want to make a blind guess, it will be > appreciated if you can please also shed some lights on where I can get more > information logging or other way where I can troubleshoot. I think it might > be worth tracking what is happening when workflow reach the status "RSUBMIT" > and how to get a specific logging info specific to it... > > Again your advice and kind help will be appreciated usual. > > > Thanks > Luke > > > -----Original Message----- > > From: Mattmann, Chris A (3980) [mailto:[email protected]] > > Sent: 2014年10月26日 22:18 > > To: Luke; 'Zichuan Wang' > > Cc: 'Christian Alan Mattmann'; [email protected] (mailto:[email protected]); > > [email protected] (mailto:[email protected]); > > [email protected] (mailto:[email protected]) > > Subject: Re: re: Question about OODT file manager > > > > Hi Luke, > > > > Thanks and sorry it’s taken me a while to reply. Here are some details > > below: > > > > > > -----Original Message----- > > From: Luke <[email protected] (mailto:[email protected])> > > Date: Sunday, October 26, 2014 at 6:19 PM > > To: Chris Mattmann <[email protected] > > (mailto:[email protected])>, 'Zichuan Wang' > > <[email protected] (mailto:[email protected])> > > Cc: Chris Mattmann <[email protected] (mailto:[email protected])>, > > "[email protected] (mailto:[email protected])" > > <[email protected] (mailto:[email protected])>, "[email protected] > > (mailto:[email protected])" <[email protected] (mailto:[email protected])>, > > "[email protected] (mailto:[email protected])" <[email protected] > > (mailto:[email protected])> > > Subject: RE: re: Question about OODT file manager > > > > > Hi Professor Mattmann and OODT DEV, > > > > > > Sorry to trouble you with this email, our team has been struggling in > > > the oodt to send json files to solr. > > > One of the difficulties is still getting OODT workflow to call the > > > poster.py in etllib. > > > > > > > > > Sorry that you’re having difficulty let me try and help. > > > > > > > > I am not sure if my understanding is correct with OODT requirement, I > > > hope you can please kindly advice and help with our confusion. > > > > > > a set of goals in my mind with OODT is as follows, please kindly > > > confirm and clarify: > > > > > > 1) > > > Get the File-Manager up and running. > > > > > > > > > Yep, hopefully as installed via OODT RADIX. > > > > > 2) > > > send all json files with command wmgr-client to the fileManager server. > > > (I believe we can achieve it with a bash script or probably python > > > that calls the command line sequentially with each json file name as an > > > argument?!) > > > > > > > > > Suggestion: > > > > 1. Use the OODT crawler and file manager to crawl/index the JSON files (in > > place data transfer). > > 2. Take a look at CAS-PGE, it will help you write a workflow task that will > > wrap > > ETLlib and the poster command. > > 3. Once you are confident with #2, whip up a script that pages through all > > of > > your indexed JSON files, and then for each one, submits a workflow event > > (you > > may need to look into aggregating them) that calls your CAS-PGE wrapped > > poster task from ETLlib. > > > > > 3) > > > Once we have json files sent and stored in the File-Manager, we need to > > > get workflow-manager up and running, and we can create a workflow that > > > send those jsons file from the file manager to solr. > > > > > > > > > See above. > > > > > 4) > > > Create a workflow according to > > > Workflow2 User Guide > > > <https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide> > > > > > > > > > > > > > here comes the problem….. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am not sure how to create a workflow task which can call the > > > poster.py in python etllib, it looks like we need to create our own > > > java class that extend <TaskInstance> which is an abstract Java class > > > with one abstract method that has the following signature: > > > > > > > > > protectedabstract ResultsState performExecution(ControlMetadata > > > crtlMetadata); > > > However, the detail of where to find the corresponding libs > > > and where to put our implementation in workflow manager is being > > > neglected in that page. I am not sure if we should use TaskInstance, > > > but it seems the workflow has to have an interface thru which it can > > > call the python code i.e. poster.py. and it looks like we need to > > > embody the TaskInstance::performExecution by injecting the code that > > > calls the poster.py and return the resultState. > > > > > > > > > It would be greatly appreciated if you could please shed some lights > > > and advice how we can get a task instance to call the poster.py. BTW, I > > > am also not sure if my understanding is correct, please kindly correct > > > it if inappropriate. Your help will be appreciated as usual. > > > > > > > > > > > > Thanks > > > Luke > > > > > > > > > Thanks Luke, see above. Let me know if it helps. > > > > Cheers! > > > > Chris > > > > > > > > From: Mattmann, Chris A (3980) [mailto:[email protected]] > > > > > > Sent: 2014年10月25日 > > > 13:34 > > > To: Zichuan Wang > > > Cc: Christian Alan Mattmann; Luke; [email protected] > > > (mailto:[email protected]); [email protected] (mailto:[email protected]) > > > Subject: Re: 回复: Question about OODT file manager > > > > > > > > > > > > Please cc > > > [email protected] <mailto:[email protected]> I will reply in detail > > > soon > > > > > > Sent from my iPhone > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++ > > Chris Mattmann, Ph.D. > > Chief Architect > > Instrument Software and Science Data Systems Section (398) NASA Jet > > Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 168-519, Mailstop: 168-527 > > Email: [email protected] (mailto:[email protected]) > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++ > > Adjunct Associate Professor, Computer Science Department University of > > Southern California, Los Angeles, CA 90089 USA > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > ++ > > > > > > > > > > > > > > > > > > > > > On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <[email protected] > > > (mailto:[email protected])> wrote: > > > > > > > > > Dear Professor, > > > > > > > > > > > > Could please also explain how I can crawl all JSON file name under a > > > specific directory using CAS-PGE? I’ll work through this example > > > https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam > > > > > > > p > > > le, but it doesn’t mention anything about crawling, instead it > > > manually set the Input files paths... > > > > > > > > > > > > > > > -- > > > > > > Zichuan Wang > > > > > > University of Southern California, Department of Computer Science > > > > > > > > > > > > > > > 在 2014年10月25日 星期六,下午12:10,Zichuan Wang > > > 写道: > > > > > > Dear Professor, > > > > > > > > > > > > In assignment 2 specification I noticed that you mentioned OODT File > > > Manager, but from my understanding, we are using ETLLib poster which > > > talks directly to Solr. So how can we use OODT File Manager in this > > > assignment? > > > > > > > > > > > > -- > > > > > > Zichuan Wang > > > > > > University of Southern California, Department of Computer Science > > > 附件: > - workflow_monitor.jpg >
