Thanks Professor Mattmann, not running batch_stub was the main culprit and there were some other issues such as missing jars; and sorry for not confirming this right away, my laptop was actually crashing, and i just had time to fix it; BTW, I was able to get the cas-pge example to work, (even though I saw the workflow failed to pass the pre-condition in the log, the combined file and some metadata files (i.e.3 files) were still successfully ingested and placed in the output directory)
BTW, i think there are a lot of mistakes in the documents, do you want us to help correct the document(i.e. https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example )? If possible, I would like to please share my notes with some problem steps mentioned there. Anyway, thanks for your help and appreciated. Thanks Luke -----Original Message----- From: Mattmann, Chris A (3980) [mailto:[email protected]] Sent: Saturday, November 1, 2014 10:48 AM To: Luke; [email protected] Cc: 'Christian Alan Mattmann'; [email protected]; [email protected]; 'Zichuan Wang' Subject: Re: re: Question about OODT file manager Dear Luke, just confirming, we solved this in class right? It had to do with the batch stub not being turned on. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Luke <[email protected]> Date: Tuesday, October 28, 2014 at 12:52 PM To: Chris Mattmann <[email protected]>, "[email protected]" <[email protected]> Cc: Chris Mattmann <[email protected]>, "[email protected]" <[email protected]>, "[email protected]" <[email protected]>, 'Zichuan Wang' <[email protected]> Subject: RE: re: Question about OODT file manager >Dear Professor Mattamnn, >Thanks a lot Professor Mattmann for the kind help, it is appreciated, >sorry for getting back to you with my appreciation, I have been >conducting tests with OODT based on your advice, but unfortunately I am >having another problem.... > >I am following the steps >(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example >) to get a sense of how to get workflow to work. >The problem is that the File-Concatenator-PGE (by running the wmgr-client >command-line) does not seems to be invoked or executed, but I am seeing >the tasks are getting stacked up in the workflow manager with status >either "RSUBMIT" or "QUEUED", but they are not getting executed, PFA: >workflow_monitor.jpg, please note, by default the workflow min pool size >is 6; so here comes another problem, i have 6 submitted tasks with status >RSUBMIT, but any new incoming tasks will be forwarded to the waiting >QUEUE with status "QUEUED"...please refer to the workflow_monitor.jpg for >details, where I have 3 QUEUED workflow task and 6 RSUMBITE tasks. > >Question 1): not sure why the workflow is not being executed, and hanging >at the state of "RSUBMIT", after enabling the log level, I am seeing the >following entry in the log, not sure if this has anything to do with the >"hanging" problem where workflow is not getting executed and hanging at >state of "RSUBMIT". > Oct 28, 2014 3:35:07 AM >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread >safeCheckJobComplete > WARNING: Exception checking completion status for job: >[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception: >java.lang.NullPointerException > >Question 2): I think currently on my side any new incoming workflow task >I am sending with the following command is being directed to the waiting >"QUEUE" because of the min pool size (i.e. 6) (I can increase this to a >larger number though), > ./wmgr-client --url http://localhost:9200 --operation --sendEvent >--eventName fileconcatenator-pge --metaData --key RunID testNumber1 > If possible, I would like to please know if there is a way we can purge >the queue and get rid of those workflow tasks either in "RSUMBIT" and >"QUEUED" I have already sent, please kindly help. > >Very sorry for troubling you with this, to be honest I find OODT a bit >challenging to grasp within a short time frame, probably because there is >no book like OODT in action like Solr.... and what I am doing is just >trial and error blended with guess, but I don’t want to make a blind >guess, it will be appreciated if you can please also shed some lights on >where I can get more information logging or other way where I can >troubleshoot. I think it might be worth tracking what is happening when >workflow reach the status "RSUBMIT" and how to get a specific logging >info specific to it... > >Again your advice and kind help will be appreciated usual. > > >Thanks >Luke > >> -----Original Message----- >> From: Mattmann, Chris A (3980) [mailto:[email protected]] >> Sent: 2014年10月26日 22:18 >> To: Luke; 'Zichuan Wang' >> Cc: 'Christian Alan Mattmann'; [email protected]; [email protected]; >> [email protected] >> Subject: Re: re: Question about OODT file manager >> >> Hi Luke, >> >> Thanks and sorry it’s taken me a while to reply. Here are some details >>below: >> >> >> -----Original Message----- >> From: Luke <[email protected]> >> Date: Sunday, October 26, 2014 at 6:19 PM >> To: Chris Mattmann <[email protected]>, 'Zichuan Wang' >> <[email protected]> >> Cc: Chris Mattmann <[email protected]>, "[email protected]" >> <[email protected]>, "[email protected]" <[email protected]>, >> "[email protected]" <[email protected]> >> Subject: RE: re: Question about OODT file manager >> >> >Hi Professor Mattmann and OODT DEV, >> > >> >Sorry to trouble you with this email, our team has been struggling in >> >the oodt to send json files to solr. >> >One of the difficulties is still getting OODT workflow to call the >> >poster.py in etllib. >> >> Sorry that you’re having difficulty let me try and help. >> >> > >> >I am not sure if my understanding is correct with OODT requirement, I >> >hope you can please kindly advice and help with our confusion. >> > >> >a set of goals in my mind with OODT is as follows, please kindly >> >confirm and clarify: >> > >> >1) >> >Get the File-Manager up and running. >> >> Yep, hopefully as installed via OODT RADIX. >> >> >2) >> >send all json files with command wmgr-client to the fileManager server. >> >(I believe we can achieve it with a bash script or probably python >> >that calls the command line sequentially with each json file name as an >> >argument?!) >> >> Suggestion: >> >> 1. Use the OODT crawler and file manager to crawl/index the JSON files >>(in >> place data transfer). >> 2. Take a look at CAS-PGE, it will help you write a workflow task that >>will wrap >> ETLlib and the poster command. >> 3. Once you are confident with #2, whip up a script that pages through >>all of >> your indexed JSON files, and then for each one, submits a workflow >>event (you >> may need to look into aggregating them) that calls your CAS-PGE wrapped >> poster task from ETLlib. >> >> >3) >> >Once we have json files sent and stored in the File-Manager, we need to >> >get workflow-manager up and running, and we can create a workflow that >> >send those jsons file from the file manager to solr. >> >> See above. >> >> >4) >> >Create a workflow according to >> >Workflow2 User Guide >> ><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide> >> >>>>>>>>>>> here comes the problem….. >> > I am not sure how to create a workflow task which can call the >> >poster.py in python etllib, it looks like we need to create our own >> >java class that extend <TaskInstance> which is an abstract Java class >> >with one abstract method that has the following signature: >> > >> > >> >protectedabstract ResultsState performExecution(ControlMetadata >> >crtlMetadata); >> > However, the detail of where to find the corresponding libs >> >and where to put our implementation in workflow manager is being >> >neglected in that page. I am not sure if we should use TaskInstance, >> >but it seems the workflow has to have an interface thru which it can >> >call the python code i.e. poster.py. and it looks like we need to >> >embody the TaskInstance::performExecution by injecting the code that >> >calls the poster.py and return the resultState. >> > >> > >> >It would be greatly appreciated if you could please shed some lights >> >and advice how we can get a task instance to call the poster.py. BTW, I >> >am also not sure if my understanding is correct, please kindly correct >> >it if inappropriate. Your help will be appreciated as usual. >> > >> > >> > >> >Thanks >> >Luke >> >> Thanks Luke, see above. Let me know if it helps. >> >> Cheers! >> >> Chris >> >> > >> >From: Mattmann, Chris A (3980) [mailto:[email protected]] >> > >> >Sent: 2014年10月25日 >> > 13:34 >> >To: Zichuan Wang >> >Cc: Christian Alan Mattmann; Luke; [email protected]; [email protected] >> >Subject: Re: 回复: Question about OODT file manager >> > >> > >> > >> >Please cc >> >[email protected] <mailto:[email protected]> I will reply in detail >> >soon >> > >> >Sent from my iPhone >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) NASA Jet >> Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++ >> Adjunct Associate Professor, Computer Science Department University of >> Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> ++ >> >> >> >> >> >> >> > >> > >> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <[email protected]> wrote: >> > >> > >> >Dear Professor, >> > >> > >> > >> >Could please also explain how I can crawl all JSON file name under a >> >specific directory using CAS-PGE? I’ll work through this example >> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam >> p >> >le, but it doesn’t mention anything about crawling, instead it >> >manually set the Input files paths... >> > >> > >> > >> > >> >-- >> > >> >Zichuan Wang >> > >> >University of Southern California, Department of Computer Science >> > >> > >> > >> > >> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang >> >写道: >> > >> >Dear Professor, >> > >> > >> > >> >In assignment 2 specification I noticed that you mentioned OODT File >> >Manager, but from my understanding, we are using ETLLib poster which >> >talks directly to Solr. So how can we use OODT File Manager in this >> >assignment? >> > >> > >> > >> >-- >> > >> >Zichuan Wang >> > >> >University of Southern California, Department of Computer Science >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >
