Hi Val, Yep - here’s a link to the tasks.xml file: https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/workflow/src/main/resources/policy/tasks.xml
> The problem is that the ExternScriptTaskInstance is unable to recognize the > command line arguments that I want to pass to the crawler_launcher script. Hmm.. could you share your workflow manager log, or better yet, the batch_stub output? Curious to see what error is thrown. Is a script file being generated for your PGE? For example, inside your [PGE_HOME] directory, and within the particular job directory created for your execution of a workflow, you will see some files starting with “sciPgeExeScript_…”. You’ll find one for your pgeConfig, and you can check to see what the PGE commands actually translate into, with respect to a shell script format. If that file is there, take a look at it, and validate whether the command works within the script (i.e. copy/paste and run the crawler command manually). Another suggestion is to take a step back, and build up slowly, i.e.: 1. Do an “echo” command within your PGE first. (e.g. <cmd> echo “Hello APL.” > /tmp/test.txt</cmd>) 2. If above works, do a crawler_launcher empty command(e.g. <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the batch_stub or Workflow Manager prints some kind of output when you run the workflow. 3. Build up your crawler_launcher command piece by piece to see where it is failing Thanks, Rishi On Oct 8, 2014, at 4:24 PM, Mallder, Valerie <[email protected]> wrote: > Hi Rishi, > > Thank you very much for pointing me to your working example. This is very > helpful. My pgeConfig looks very similar to yours. So, I commented out the > resource manager like you suggested and tried running again without the > resource manager. And my problem still exists. The problem is that the > ExternScriptTaskInstance is unable to recognize the command line arguments > that I want to pass to the crawler_launcher script. Could you send me a link > to your tasks.xml file? I'm curious as to how you defined your task. My > pgeConfig and tasks.xml are below. > > Thanks! > Val > > > <?xml version="1.0" encoding="UTF-8"?> > <pgeConfig> > > <!-- How to run the PGE --> > <exe dir="[JobDir]" shell="/bin/sh" envReplace="true"> > <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation > --launchAutoCrawler \ > --filemgrUrl [FILEMGR_URL] \ > --clientTransferer > org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \ > --productPath [JobInputDir] \ > --mimeExtractorRepo > [OODT_HOME]/extensions/policy/mime-extractor-map.xml \ > --actionIds MoveFileToLevel0Dir</cmd> > </exe> > > <!-- Files to ingest --> > <output/> > </output> > > <!-- Custom metadata to add to output files --> > <customMetadata> > <metadata key="JobDir" val="[OODT_HOME]"/> > <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/> > <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/> > <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/> > </customMetadata> > > </pgeConfig> > > > > <!-- tasks.xml **************************************************--> > > <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > > <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" > class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"> > <conditions/> <!-- There are no pre execution conditions right now --> > <configuration> > > <property name="ShellType" value="/bin/sh" /> > <property name="PathToScript" > value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" /> > > <property name="PGETask_Name" value="crawler_launcher PGE Task"/> > <property name="PGETask_ConfigFilePath" > value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" > envReplace="true" /> > </configuration> > </task> > > </cas:tasks> > > Valerie A. Mallder > New Horizons Deputy Mission System Engineer > Johns Hopkins University/Applied Physics Laboratory > > >> -----Original Message----- >> From: Verma, Rishi (398J) [mailto:[email protected]] >> Sent: Wednesday, October 08, 2014 6:01 PM >> To: [email protected] >> Subject: Re: what is batch stub? Is it necessary? >> >> Hi Valerie, >> >>>>>> All I am trying to do is run "crawler_launcher" as a workflow task >>>>>> in the CAS PGE environment. >> >> Interesting. I have a working example here [1] you can look at that does >> this exact >> thing. >> >>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me >>>>>> what it is, why it is necessary, and how to run it (please provide >>>>>> exact syntax to put in my startup shell script, because I would >>>>>> never be able to figure it out for myself and I don't want to have >>>>>> to bother everyone again.) >> >> Batchstub is only necessary if your Workflow Manger is sending jobs to >> Resource >> Manager for execution (where the default execution is to run the job in >> something >> called a ?batch stub? executable). Think of batch stubs as a small wrapper >> program that takes a bundle of executable instructions from Resource Manager, >> and executes them in a shell environment within a given remote (or local) >> machine. >> >> Here?s my suggestion: >> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the >> following command (it?ll start a batch stub in a terminal on port 2001): >>> ./batch_stub 2001 >> >> If the above step doesn?t fix your problem, you can also try having Workflow >> Manager NOT send jobs to Resource Manager for execution, and instead execute >> jobs locally through Workflow Manager itself (on localhost only!). To >> disable job >> transfer to Resource Manger, you?ll need to modify the Workflow Manager >> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and specifically >> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? line. >> I?ve done this in my example code below, see [2] for an exact example of >> this. >> After modifying workflow.properties, make sure to restart workflow manager >> ($OODT_HOME/wmgr/bin/wmgr stop followed by $OODT_HOME/wmgr/bin/wmgr >> start). >> >> Thanks, >> Rishi >> >> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- >> netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample.xml >> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- >> netscan/workflow/src/main/resources/etc/workflow.properties >> >> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J) >> <[email protected]> wrote: >> >>> Valerie, >>> >>> I would have thought it would have just not used a batch stub by default. >>> That >> said if you go into the $OODT_HOME/resmgr/bin there should be a script to >> start a >> batch stub. Right now on my phone I forget the name of the script but if you >> more >> the file you will see the Java class name that corresponds to below. You >> should >> specify a port when you run the script which from the looks of the output >> below >> should be 2001. >>> >>> HTH, >>> Paul R >>> >>> Sent from my iPhone >>> >>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie <[email protected]> >> wrote: >>>> >>>> Well then, I'm proud to be a member :) (I think .... ) >>>> >>>> >>>> Valerie A. Mallder >>>> New Horizons Deputy Mission System Engineer Johns Hopkins >>>> University/Applied Physics Laboratory >>>> >>>> >>>>> -----Original Message----- >>>>> From: Bruce Barkstrom [mailto:[email protected]] >>>>> Sent: Wednesday, October 08, 2014 4:54 PM >>>>> To: [email protected] >>>>> Subject: Re: what is batch stub? Is it necessary? >>>>> >>>>> You have every right to bother everyone. >>>>> You won't get what you need unless you do. >>>>> >>>>> You get one honorary membership in the Society of General Agitators >>>>> - at the rank of Major Agitator. >>>>> >>>>> Bruce B. >>>>> >>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie >>>>> <[email protected] >>>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I am still having trouble getting my CAS PGE crawler task to run >>>>>> due to >>>>>> http://localhost:2001 being "down". I have spent the last 2 days >>>>>> tracing through the resource manager code and tracked this down to >>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing to >>>>>> execute the task remotely, because on line 75 of >>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on its >>>>>> line 74) is trying to call "isAlive" on the webservice named >>>>>> "batchstub" which, to my knowledge, is not running because I have not >>>>>> done >> anything explicitly to run it. >>>>>> >>>>>> All I am trying to do is run "crawler_launcher" as a workflow task >>>>>> in the CAS PGE environment. I had it running perfectly before I >>>>>> started trying to make it run as part of a workflow. I really miss >>>>>> my crawler and really want it to run again L >>>>>> >>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me >>>>>> what it is, why it is necessary, and how to run it (please provide >>>>>> exact syntax to put in my startup shell script, because I would >>>>>> never be able to figure it out for myself and I don't want to have >>>>>> to bother everyone again.) >>>>>> >>>>>> Thanks so much! >>>>>> >>>>>> Val >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Valerie A. Mallder >>>>>> >>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins >>>>>> University/Applied Physics Laboratory >>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 >>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry) >>>>>> >>>>>> >> >> --- >> Rishi Verma >> NASA Jet Propulsion Laboratory >> California Institute of Technology > --- Rishi Verma NASA Jet Propulsion Laboratory California Institute of Technology
