Dear Professor,  

We are stuck in OODT. The most critical problem we have now is

“How to make crawler work with workflow”?  

--  
Zichuan Wang
University of Southern California, Department of Computer Science


在 2014年10月28日 星期二,下午12:52,Luke 写道:

> Dear Professor Mattamnn,
> Thanks a lot Professor Mattmann for the kind help, it is appreciated, sorry 
> for getting back to you with my appreciation, I have been conducting tests 
> with OODT based on your advice, but unfortunately I am having another 
> problem....
>  
> I am following the steps 
> (https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example) 
> to get a sense of how to get workflow to work.
> The problem is that the File-Concatenator-PGE (by running the wmgr-client 
> command-line) does not seems to be invoked or executed, but I am seeing the 
> tasks are getting stacked up in the workflow manager with status either 
> "RSUBMIT" or "QUEUED", but they are not getting executed, PFA: 
> workflow_monitor.jpg, please note, by default the workflow min pool size is 
> 6; so here comes another problem, i have 6 submitted tasks with status 
> RSUBMIT, but any new incoming tasks will be forwarded to the waiting QUEUE 
> with status "QUEUED"...please refer to the workflow_monitor.jpg for details, 
> where I have 3 QUEUED workflow task and 6 RSUMBITE tasks.  
>  
> Question 1): not sure why the workflow is not being executed, and hanging at 
> the state of "RSUBMIT", after enabling the log level, I am seeing the 
> following entry in the log, not sure if this has anything to do with the 
> "hanging" problem where workflow is not getting executed and hanging at state 
> of "RSUBMIT".
> Oct 28, 2014 3:35:07 AM 
> org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread 
> safeCheckJobComplete
> WARNING: Exception checking completion status for job: 
> [2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception: 
> java.lang.NullPointerException
>  
> Question 2): I think currently on my side any new incoming workflow task I am 
> sending with the following command is being directed to the waiting "QUEUE" 
> because of the min pool size (i.e. 6) (I can increase this to a larger number 
> though),  
> ./wmgr-client --url http://localhost:9200 --operation --sendEvent --eventName 
> fileconcatenator-pge --metaData --key RunID testNumber1
> If possible, I would like to please know if there is a way we can purge the 
> queue and get rid of those workflow tasks either in "RSUMBIT" and "QUEUED" I 
> have already sent, please kindly help.
>  
> Very sorry for troubling you with this, to be honest I find OODT a bit 
> challenging to grasp within a short time frame, probably because there is no 
> book like OODT in action like Solr.... and what I am doing is just trial and 
> error blended with guess, but I don’t want to make a blind guess, it will be 
> appreciated if you can please also shed some lights on where I can get more 
> information logging or other way where I can troubleshoot. I think it might 
> be worth tracking what is happening when workflow reach the status "RSUBMIT" 
> and how to get a specific logging info specific to it...
>  
> Again your advice and kind help will be appreciated usual.
>  
>  
> Thanks
> Luke
>  
> > -----Original Message-----
> > From: Mattmann, Chris A (3980) [mailto:[email protected]]
> > Sent: 2014年10月26日 22:18
> > To: Luke; 'Zichuan Wang'
> > Cc: 'Christian Alan Mattmann'; [email protected] (mailto:[email protected]); 
> > [email protected] (mailto:[email protected]);
> > [email protected] (mailto:[email protected])
> > Subject: Re: re: Question about OODT file manager
> >  
> > Hi Luke,
> >  
> > Thanks and sorry it’s taken me a while to reply. Here are some details 
> > below:
> >  
> >  
> > -----Original Message-----
> > From: Luke <[email protected] (mailto:[email protected])>
> > Date: Sunday, October 26, 2014 at 6:19 PM
> > To: Chris Mattmann <[email protected] 
> > (mailto:[email protected])>, 'Zichuan Wang'
> > <[email protected] (mailto:[email protected])>
> > Cc: Chris Mattmann <[email protected] (mailto:[email protected])>, 
> > "[email protected] (mailto:[email protected])"
> > <[email protected] (mailto:[email protected])>, "[email protected] 
> > (mailto:[email protected])" <[email protected] (mailto:[email protected])>,
> > "[email protected] (mailto:[email protected])" <[email protected] 
> > (mailto:[email protected])>
> > Subject: RE: re: Question about OODT file manager
> >  
> > > Hi Professor Mattmann and OODT DEV,
> > >  
> > > Sorry to trouble you with this email, our team has been struggling in
> > > the oodt to send json files to solr.
> > > One of the difficulties is still getting OODT workflow to call the
> > > poster.py in etllib.
> > >  
> >  
> >  
> > Sorry that you’re having difficulty let me try and help.
> >  
> > >  
> > > I am not sure if my understanding is correct with OODT requirement, I
> > > hope you can please kindly advice and help with our confusion.
> > >  
> > > a set of goals in my mind with OODT is as follows, please kindly
> > > confirm and clarify:
> > >  
> > > 1)
> > > Get the File-Manager up and running.
> > >  
> >  
> >  
> > Yep, hopefully as installed via OODT RADIX.
> >  
> > > 2)
> > > send all json files with command wmgr-client to the fileManager server.
> > > (I believe we can achieve it with a bash script or probably python
> > > that calls the command line sequentially with each json file name as an
> > > argument?!)
> > >  
> >  
> >  
> > Suggestion:
> >  
> > 1. Use the OODT crawler and file manager to crawl/index the JSON files (in
> > place data transfer).
> > 2. Take a look at CAS-PGE, it will help you write a workflow task that will 
> > wrap
> > ETLlib and the poster command.
> > 3. Once you are confident with #2, whip up a script that pages through all 
> > of
> > your indexed JSON files, and then for each one, submits a workflow event 
> > (you
> > may need to look into aggregating them) that calls your CAS-PGE wrapped
> > poster task from ETLlib.
> >  
> > > 3)
> > > Once we have json files sent and stored in the File-Manager, we need to
> > > get workflow-manager up and running, and we can create a workflow that
> > > send those jsons file from the file manager to solr.
> > >  
> >  
> >  
> > See above.
> >  
> > > 4)
> > > Create a workflow according to
> > > Workflow2 User Guide
> > > <https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide>
> > > > > > > > > > > > > here comes the problem…..
> > > > > > > > > > > >  
> > > > > > > > > > >  
> > > > > > > > > >  
> > > > > > > > >  
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> > > I am not sure how to create a workflow task which can call the
> > > poster.py in python etllib, it looks like we need to create our own
> > > java class that extend <TaskInstance> which is an abstract Java class
> > > with one abstract method that has the following signature:
> > >  
> > >  
> > > protectedabstract ResultsState performExecution(ControlMetadata
> > > crtlMetadata);
> > > However, the detail of where to find the corresponding libs
> > > and where to put our implementation in workflow manager is being
> > > neglected in that page. I am not sure if we should use TaskInstance,
> > > but it seems the workflow has to have an interface thru which it can
> > > call the python code i.e. poster.py. and it looks like we need to
> > > embody the TaskInstance::performExecution by injecting the code that
> > > calls the poster.py and return the resultState.
> > >  
> > >  
> > > It would be greatly appreciated if you could please shed some lights
> > > and advice how we can get a task instance to call the poster.py. BTW, I
> > > am also not sure if my understanding is correct, please kindly correct
> > > it if inappropriate. Your help will be appreciated as usual.
> > >  
> > >  
> > >  
> > > Thanks
> > > Luke
> > >  
> >  
> >  
> > Thanks Luke, see above. Let me know if it helps.
> >  
> > Cheers!
> >  
> > Chris
> >  
> > >  
> > > From: Mattmann, Chris A (3980) [mailto:[email protected]]
> > >  
> > > Sent: 2014年10月25日
> > > 13:34
> > > To: Zichuan Wang
> > > Cc: Christian Alan Mattmann; Luke; [email protected] 
> > > (mailto:[email protected]); [email protected] (mailto:[email protected])
> > > Subject: Re: 回复: Question about OODT file manager
> > >  
> > >  
> > >  
> > > Please cc
> > > [email protected] <mailto:[email protected]> I will reply in detail
> > > soon
> > >  
> > > Sent from my iPhone
> >  
> >  
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++
> > Chris Mattmann, Ph.D.
> > Chief Architect
> > Instrument Software and Science Data Systems Section (398) NASA Jet
> > Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 168-519, Mailstop: 168-527
> > Email: [email protected] (mailto:[email protected])
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++
> > Adjunct Associate Professor, Computer Science Department University of
> > Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++
> >  
> >  
> >  
> >  
> >  
> >  
> > >  
> > >  
> > > On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <[email protected] 
> > > (mailto:[email protected])> wrote:
> > >  
> > >  
> > > Dear Professor,
> > >  
> > >  
> > >  
> > > Could please also explain how I can crawl all JSON file name under a
> > > specific directory using CAS-PGE? I’ll work through this example
> > > https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
> > >  
> >  
> > p
> > > le, but it doesn’t mention anything about crawling, instead it
> > > manually set the Input files paths...
> > >  
> > >  
> > >  
> > >  
> > > --
> > >  
> > > Zichuan Wang
> > >  
> > > University of Southern California, Department of Computer Science
> > >  
> > >  
> > >  
> > >  
> > > 在 2014年10月25日 星期六,下午12:10,Zichuan Wang
> > > 写道:
> > >  
> > > Dear Professor,
> > >  
> > >  
> > >  
> > > In assignment 2 specification I noticed that you mentioned OODT File
> > > Manager, but from my understanding, we are using ETLLib poster which
> > > talks directly to Solr. So how can we use OODT File Manager in this
> > > assignment?
> > >  
> > >  
> > >  
> > > --
> > >  
> > > Zichuan Wang
> > >  
> > > University of Southern California, Department of Computer Science  
>  
>  
> 附件:  
> - workflow_monitor.jpg
>  


Reply via email to