+1, we should definitely do this, Lewis. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message----- From: Lewis John Mcgibbney <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Wednesday, October 8, 2014 at 5:54 PM To: "[email protected]" <[email protected]> Subject: Re: what is batch stub? Is it necessary? >Folks, >Is it possible to create a parent issue for defining XSD's for all of the >XML file we need ti OODT? >I do not know them all, but from this thread alone, it is clear that we >could do with setting some kind of restrictions on what can be included >within task and configuration XML within OODT. >Thoughts? >Lewis > >On Wed, Oct 8, 2014 at 5:44 PM, Verma, Rishi (398J) < >[email protected]> wrote: > >> Hi Val, >> >> Yep - here¹s a link to the tasks.xml file: >> >> >>https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/wor >>kflow/src/main/resources/policy/tasks.xml >> >> > The problem is that the ExternScriptTaskInstance is unable to >>recognize >> the command line arguments that I want to pass to the crawler_launcher >> script. >> >> >> Hmm.. could you share your workflow manager log, or better yet, the >> batch_stub output? Curious to see what error is thrown. >> >> Is a script file being generated for your PGE? For example, inside your >> [PGE_HOME] directory, and within the particular job directory created >>for >> your execution of a workflow, you will see some files starting with >> ³sciPgeExeScript_в. You¹ll find one for your pgeConfig, and you can >>check >> to see what the PGE commands actually translate into, with respect to a >> shell script format. If that file is there, take a look at it, and >>validate >> whether the command works within the script (i.e. copy/paste and run the >> crawler command manually). >> >> Another suggestion is to take a step back, and build up slowly, i.e.: >> 1. Do an ³echo² command within your PGE first. (e.g. <cmd> echo ³Hello >> APL.² > /tmp/test.txt</cmd>) >> 2. If above works, do a crawler_launcher empty command(e.g. >> <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the >> batch_stub or Workflow Manager prints some kind of output when you run >>the >> workflow. >> 3. Build up your crawler_launcher command piece by piece to see where it >> is failing >> >> Thanks, >> Rishi >> >> On Oct 8, 2014, at 4:24 PM, Mallder, Valerie >><[email protected]> >> wrote: >> >> > Hi Rishi, >> > >> > Thank you very much for pointing me to your working example. This is >> very helpful. My pgeConfig looks very similar to yours. So, I >>commented >> out the resource manager like you suggested and tried running again >>without >> the resource manager. And my problem still exists. The problem is that >>the >> ExternScriptTaskInstance is unable to recognize the command line >>arguments >> that I want to pass to the crawler_launcher script. Could you send me a >> link to your tasks.xml file? I'm curious as to how you defined your >>task. >> My pgeConfig and tasks.xml are below. >> > >> > Thanks! >> > Val >> > >> > >> > <?xml version="1.0" encoding="UTF-8"?> >> > <pgeConfig> >> > >> > <!-- How to run the PGE --> >> > <exe dir="[JobDir]" shell="/bin/sh" envReplace="true"> >> > <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation >> --launchAutoCrawler \ >> > --filemgrUrl [FILEMGR_URL] \ >> > --clientTransferer >> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \ >> > --productPath [JobInputDir] \ >> > --mimeExtractorRepo >> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \ >> > --actionIds MoveFileToLevel0Dir</cmd> >> > </exe> >> > >> > <!-- Files to ingest --> >> > <output/> >> > </output> >> > >> > <!-- Custom metadata to add to output files --> >> > <customMetadata> >> > <metadata key="JobDir" val="[OODT_HOME]"/> >> > <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/> >> > <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/> >> > <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/> >> > </customMetadata> >> > >> > </pgeConfig> >> > >> > >> > >> > <!-- tasks.xml **************************************************--> >> > >> > <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> >> > >> > <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" >> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"> >> > <conditions/> <!-- There are no pre execution conditions right >>now >> --> >> > <configuration> >> > >> > <property name="ShellType" value="/bin/sh" /> >> > <property name="PathToScript" >> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" /> >> > >> > <property name="PGETask_Name" value="crawler_launcher PGE >> Task"/> >> > <property name="PGETask_ConfigFilePath" >> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" >> envReplace="true" /> >> > </configuration> >> > </task> >> > >> > </cas:tasks> >> > >> > Valerie A. Mallder >> > New Horizons Deputy Mission System Engineer >> > Johns Hopkins University/Applied Physics Laboratory >> > >> > >> >> -----Original Message----- >> >> From: Verma, Rishi (398J) [mailto:[email protected]] >> >> Sent: Wednesday, October 08, 2014 6:01 PM >> >> To: [email protected] >> >> Subject: Re: what is batch stub? Is it necessary? >> >> >> >> Hi Valerie, >> >> >> >>>>>> All I am trying to do is run "crawler_launcher" as a workflow >>task >> >>>>>> in the CAS PGE environment. >> >> >> >> Interesting. I have a working example here [1] you can look at that >> does this exact >> >> thing. >> >> >> >>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me >> >>>>>> what it is, why it is necessary, and how to run it (please >>provide >> >>>>>> exact syntax to put in my startup shell script, because I would >> >>>>>> never be able to figure it out for myself and I don't want to >>have >> >>>>>> to bother everyone again.) >> >> >> >> Batchstub is only necessary if your Workflow Manger is sending jobs >>to >> Resource >> >> Manager for execution (where the default execution is to run the job >>in >> something >> >> called a ?batch stub? executable). Think of batch stubs as a small >> wrapper >> >> program that takes a bundle of executable instructions from Resource >> Manager, >> >> and executes them in a shell environment within a given remote (or >> local) machine. >> >> >> >> Here?s my suggestion: >> >> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the >> >> following command (it?ll start a batch stub in a terminal on port >>2001): >> >>> ./batch_stub 2001 >> >> >> >> If the above step doesn?t fix your problem, you can also try having >> Workflow >> >> Manager NOT send jobs to Resource Manager for execution, and instead >> execute >> >> jobs locally through Workflow Manager itself (on localhost only!). To >> disable job >> >> transfer to Resource Manger, you?ll need to modify the Workflow >>Manager >> >> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and >> specifically >> >> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? >> line. >> >> I?ve done this in my example code below, see [2] for an exact example >> of this. >> >> After modifying workflow.properties, make sure to restart workflow >> manager >> >> ($OODT_HOME/wmgr/bin/wmgr stop followed by $OODT_HOME/wmgr/bin/wmgr >> >> start). >> >> >> >> Thanks, >> >> Rishi >> >> >> >> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- >> >> >> >>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample. >>xml >> >> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- >> >> netscan/workflow/src/main/resources/etc/workflow.properties >> >> >> >> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J) >> >> <[email protected]> wrote: >> >> >> >>> Valerie, >> >>> >> >>> I would have thought it would have just not used a batch stub by >> default. That >> >> said if you go into the $OODT_HOME/resmgr/bin there should be a >>script >> to start a >> >> batch stub. Right now on my phone I forget the name of the script but >> if you more >> >> the file you will see the Java class name that corresponds to below. >> You should >> >> specify a port when you run the script which from the looks of the >> output below >> >> should be 2001. >> >>> >> >>> HTH, >> >>> Paul R >> >>> >> >>> Sent from my iPhone >> >>> >> >>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie < >> [email protected]> >> >> wrote: >> >>>> >> >>>> Well then, I'm proud to be a member :) (I think .... ) >> >>>> >> >>>> >> >>>> Valerie A. Mallder >> >>>> New Horizons Deputy Mission System Engineer Johns Hopkins >> >>>> University/Applied Physics Laboratory >> >>>> >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: Bruce Barkstrom [mailto:[email protected]] >> >>>>> Sent: Wednesday, October 08, 2014 4:54 PM >> >>>>> To: [email protected] >> >>>>> Subject: Re: what is batch stub? Is it necessary? >> >>>>> >> >>>>> You have every right to bother everyone. >> >>>>> You won't get what you need unless you do. >> >>>>> >> >>>>> You get one honorary membership in the Society of General >>Agitators >> >>>>> - at the rank of Major Agitator. >> >>>>> >> >>>>> Bruce B. >> >>>>> >> >>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie >> >>>>> <[email protected] >> >>>>>> wrote: >> >>>>> >> >>>>>> Hello, >> >>>>>> >> >>>>>> I am still having trouble getting my CAS PGE crawler task to run >> >>>>>> due to >> >>>>>> http://localhost:2001 being "down". I have spent the last 2 days >> >>>>>> tracing through the resource manager code and tracked this down >>to >> >>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing to >> >>>>>> execute the task remotely, because on line 75 of >> >>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on >>its >> >>>>>> line 74) is trying to call "isAlive" on the webservice named >> >>>>>> "batchstub" which, to my knowledge, is not running because I have >> not done >> >> anything explicitly to run it. >> >>>>>> >> >>>>>> All I am trying to do is run "crawler_launcher" as a workflow >>task >> >>>>>> in the CAS PGE environment. I had it running perfectly before I >> >>>>>> started trying to make it run as part of a workflow. I really >>miss >> >>>>>> my crawler and really want it to run again L >> >>>>>> >> >>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me >> >>>>>> what it is, why it is necessary, and how to run it (please >>provide >> >>>>>> exact syntax to put in my startup shell script, because I would >> >>>>>> never be able to figure it out for myself and I don't want to >>have >> >>>>>> to bother everyone again.) >> >>>>>> >> >>>>>> Thanks so much! >> >>>>>> >> >>>>>> Val >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> Valerie A. Mallder >> >>>>>> >> >>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins >> >>>>>> University/Applied Physics Laboratory >> >>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 >> >>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry) >> >>>>>> >> >>>>>> >> >> >> >> --- >> >> Rishi Verma >> >> NASA Jet Propulsion Laboratory >> >> California Institute of Technology >> > >> >> --- >> Rishi Verma >> NASA Jet Propulsion Laboratory >> California Institute of Technology >> >> > > >-- >*Lewis*
