Thanks Cameron! I'm sure it contributed to some of my troubles, maybe not the specific one I was having at the time, but I'm sure it would have bitten me at some point. Good catch!
Thanks, Val Valerie A. Mallder New Horizons Deputy Mission System Engineer Johns Hopkins University/Applied Physics Laboratory > -----Original Message----- > From: Cameron Goodale [mailto:[email protected]] > Sent: Wednesday, October 08, 2014 11:54 PM > To: [email protected] > Subject: Re: what is batch stub? Is it necessary? > > Valerie, > > This could be nothing, or it could be the root cause...your output XML tags > are > malformed. > > <!-- Files to ingest --> > <output/> > </output> > > Should be: > <!-- Files to ingest --> > <output> > </output> > > No trailing slash in the opening tag. It might be failing since it cannot > parse the > XML cleanly. Doesn't explain the batchstub stuff, but might be related to the > latest challenge you sighted: > > *"The problem is that the ExternScriptTaskInstance is unable to recognize the > command line arguments that I want to pass to the crawler_launcher > script."* > > In my experience the XML is super finicky. > > Good Luck, and keep the questions coming. We are here to help. > > -Cameron > > > On Wed, Oct 8, 2014 at 7:37 PM, Ramirez, Paul M (398J) < > [email protected]> wrote: > > > +1 billion > > > > --Paul > > > > Sent from my iPhone > > > > > On Oct 8, 2014, at 5:55 PM, Lewis John Mcgibbney < > > [email protected]> wrote: > > > > > > Folks, > > > Is it possible to create a parent issue for defining XSD's for all > > > of the XML file we need ti OODT? > > > I do not know them all, but from this thread alone, it is clear that > > > we could do with setting some kind of restrictions on what can be > > > included within task and configuration XML within OODT. > > > Thoughts? > > > Lewis > > > > > > On Wed, Oct 8, 2014 at 5:44 PM, Verma, Rishi (398J) < > > > [email protected]> wrote: > > > > > >> Hi Val, > > >> > > >> Yep - here?s a link to the tasks.xml file: > > >> > > >> > > https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/ > > workflow/src/main/resources/policy/tasks.xml > > >> > > >>> The problem is that the ExternScriptTaskInstance is unable to > > >>> recognize > > >> the command line arguments that I want to pass to the > > >> crawler_launcher script. > > >> > > >> > > >> Hmm.. could you share your workflow manager log, or better yet, the > > >> batch_stub output? Curious to see what error is thrown. > > >> > > >> Is a script file being generated for your PGE? For example, inside > > >> your [PGE_HOME] directory, and within the particular job directory > > >> created > > for > > >> your execution of a workflow, you will see some files starting with > > >> ?sciPgeExeScript_??. You?ll find one for your pgeConfig, and you > > >> can > > check > > >> to see what the PGE commands actually translate into, with respect > > >> to a shell script format. If that file is there, take a look at it, > > >> and > > validate > > >> whether the command works within the script (i.e. copy/paste and > > >> run the crawler command manually). > > >> > > >> Another suggestion is to take a step back, and build up slowly, i.e.: > > >> 1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo > > >> ?Hello APL.? > /tmp/test.txt</cmd>) 2. If above works, do a > > >> crawler_launcher empty command(e.g. > > >> <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify > > >> the batch_stub or Workflow Manager prints some kind of output when > > >> you run > > the > > >> workflow. > > >> 3. Build up your crawler_launcher command piece by piece to see > > >> where it is failing > > >> > > >> Thanks, > > >> Rishi > > >> > > >> On Oct 8, 2014, at 4:24 PM, Mallder, Valerie < > > [email protected]> > > >> wrote: > > >> > > >>> Hi Rishi, > > >>> > > >>> Thank you very much for pointing me to your working example. This > > >>> is > > >> very helpful. My pgeConfig looks very similar to yours. So, I > > commented > > >> out the resource manager like you suggested and tried running again > > without > > >> the resource manager. And my problem still exists. The problem is > > >> that > > the > > >> ExternScriptTaskInstance is unable to recognize the command line > > arguments > > >> that I want to pass to the crawler_launcher script. Could you send > > >> me a link to your tasks.xml file? I'm curious as to how you defined > > >> your > > task. > > >> My pgeConfig and tasks.xml are below. > > >>> > > >>> Thanks! > > >>> Val > > >>> > > >>> > > >>> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig> > > >>> > > >>> <!-- How to run the PGE --> > > >>> <exe dir="[JobDir]" shell="/bin/sh" envReplace="true"> > > >>> <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation > > >> --launchAutoCrawler \ > > >>> --filemgrUrl [FILEMGR_URL] \ > > >>> --clientTransferer > > >> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \ > > >>> --productPath [JobInputDir] \ > > >>> --mimeExtractorRepo > > >> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \ > > >>> --actionIds MoveFileToLevel0Dir</cmd> </exe> > > >>> > > >>> <!-- Files to ingest --> > > >>> <output/> > > >>> </output> > > >>> > > >>> <!-- Custom metadata to add to output files --> <customMetadata> > > >>> <metadata key="JobDir" val="[OODT_HOME]"/> > > >>> <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/> > > >>> <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/> > > >>> <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/> > > >>> </customMetadata> > > >>> > > >>> </pgeConfig> > > >>> > > >>> > > >>> > > >>> <!-- tasks.xml > > >>> **************************************************--> > > >>> > > >>> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > > >>> > > >>> <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" > > >> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstan > > >> ce"> > > >>> <conditions/> <!-- There are no pre execution conditions > > >>> right now > > >> --> > > >>> <configuration> > > >>> > > >>> <property name="ShellType" value="/bin/sh" /> > > >>> <property name="PathToScript" > > >> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" /> > > >>> > > >>> <property name="PGETask_Name" value="crawler_launcher PGE > > >> Task"/> > > >>> <property name="PGETask_ConfigFilePath" > > >> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" > > >> envReplace="true" /> > > >>> </configuration> > > >>> </task> > > >>> > > >>> </cas:tasks> > > >>> > > >>> Valerie A. Mallder > > >>> New Horizons Deputy Mission System Engineer Johns Hopkins > > >>> University/Applied Physics Laboratory > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: Verma, Rishi (398J) [mailto:[email protected]] > > >>>> Sent: Wednesday, October 08, 2014 6:01 PM > > >>>> To: [email protected] > > >>>> Subject: Re: what is batch stub? Is it necessary? > > >>>> > > >>>> Hi Valerie, > > >>>> > > >>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow > > >>>>>>>> task in the CAS PGE environment. > > >>>> > > >>>> Interesting. I have a working example here [1] you can look at > > >>>> that > > >> does this exact > > >>>> thing. > > >>>> > > >>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell > > >>>>>>>> me what it is, why it is necessary, and how to run it (please > > >>>>>>>> provide exact syntax to put in my startup shell script, > > >>>>>>>> because I would never be able to figure it out for myself and > > >>>>>>>> I don't want to have to bother everyone again.) > > >>>> > > >>>> Batchstub is only necessary if your Workflow Manger is sending > > >>>> jobs to > > >> Resource > > >>>> Manager for execution (where the default execution is to run the > > >>>> job > > in > > >> something > > >>>> called a ?batch stub? executable). Think of batch stubs as a > > >>>> small > > >> wrapper > > >>>> program that takes a bundle of executable instructions from > > >>>> Resource > > >> Manager, > > >>>> and executes them in a shell environment within a given remote > > >>>> (or > > >> local) machine. > > >>>> > > >>>> Here?s my suggestion: > > >>>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute > > >>>> the following command (it?ll start a batch stub in a terminal on > > >>>> port > > 2001): > > >>>>> ./batch_stub 2001 > > >>>> > > >>>> If the above step doesn?t fix your problem, you can also try > > >>>> having > > >> Workflow > > >>>> Manager NOT send jobs to Resource Manager for execution, and > > >>>> instead > > >> execute > > >>>> jobs locally through Workflow Manager itself (on localhost > > >>>> only!). To > > >> disable job > > >>>> transfer to Resource Manger, you?ll need to modify the Workflow > > Manager > > >>>> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and > > >> specifically > > >>>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? > > >> line. > > >>>> I?ve done this in my example code below, see [2] for an exact > > >>>> example > > >> of this. > > >>>> After modifying workflow.properties, make sure to restart > > >>>> workflow > > >> manager > > >>>> ($OODT_HOME/wmgr/bin/wmgr stop followed by > $OODT_HOME/wmgr/bin/wmgr > > >>>> start). > > >>>> > > >>>> Thanks, > > >>>> Rishi > > >>>> > > >>>> [1] > > >>>> https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- > > >> > > netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsamp > > le.xml > > >>>> [2] > > >>>> https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- > > >>>> netscan/workflow/src/main/resources/etc/workflow.properties > > >>>> > > >>>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J) > > >>>> <[email protected]> wrote: > > >>>> > > >>>>> Valerie, > > >>>>> > > >>>>> I would have thought it would have just not used a batch stub by > > >> default. That > > >>>> said if you go into the $OODT_HOME/resmgr/bin there should be a > > >>>> script > > >> to start a > > >>>> batch stub. Right now on my phone I forget the name of the script > > >>>> but > > >> if you more > > >>>> the file you will see the Java class name that corresponds to below. > > >> You should > > >>>> specify a port when you run the script which from the looks of > > >>>> the > > >> output below > > >>>> should be 2001. > > >>>>> > > >>>>> HTH, > > >>>>> Paul R > > >>>>> > > >>>>> Sent from my iPhone > > >>>>> > > >>>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie < > > >> [email protected]> > > >>>> wrote: > > >>>>>> > > >>>>>> Well then, I'm proud to be a member :) (I think .... ) > > >>>>>> > > >>>>>> > > >>>>>> Valerie A. Mallder > > >>>>>> New Horizons Deputy Mission System Engineer Johns Hopkins > > >>>>>> University/Applied Physics Laboratory > > >>>>>> > > >>>>>> > > >>>>>>> -----Original Message----- > > >>>>>>> From: Bruce Barkstrom [mailto:[email protected]] > > >>>>>>> Sent: Wednesday, October 08, 2014 4:54 PM > > >>>>>>> To: [email protected] > > >>>>>>> Subject: Re: what is batch stub? Is it necessary? > > >>>>>>> > > >>>>>>> You have every right to bother everyone. > > >>>>>>> You won't get what you need unless you do. > > >>>>>>> > > >>>>>>> You get one honorary membership in the Society of General > > >>>>>>> Agitators > > >>>>>>> - at the rank of Major Agitator. > > >>>>>>> > > >>>>>>> Bruce B. > > >>>>>>> > > >>>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie > > >>>>>>> <[email protected] > > >>>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hello, > > >>>>>>>> > > >>>>>>>> I am still having trouble getting my CAS PGE crawler task to > > >>>>>>>> run due to > > >>>>>>>> http://localhost:2001 being "down". I have spent the last 2 > > >>>>>>>> days tracing through the resource manager code and tracked > > >>>>>>>> this down to line 146 of LRUScheduler where the > > >>>>>>>> XmlRpcBatchMgr is failing to execute the task remotely, > > >>>>>>>> because on line 75 of XmlRpcBatchMgrProxy (that was > > >>>>>>>> instantiated by XmlRpcBatchMgr on > > its > > >>>>>>>> line 74) is trying to call "isAlive" on the webservice named > > >>>>>>>> "batchstub" which, to my knowledge, is not running because I > > >>>>>>>> have > > >> not done > > >>>> anything explicitly to run it. > > >>>>>>>> > > >>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow > > >>>>>>>> task in the CAS PGE environment. I had it running perfectly > > >>>>>>>> before I started trying to make it run as part of a workflow. > > >>>>>>>> I really > > miss > > >>>>>>>> my crawler and really want it to run again L > > >>>>>>>> > > >>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell > > >>>>>>>> me what it is, why it is necessary, and how to run it (please > > >>>>>>>> provide exact syntax to put in my startup shell script, > > >>>>>>>> because I would never be able to figure it out for myself and > > >>>>>>>> I don't want to have to bother everyone again.) > > >>>>>>>> > > >>>>>>>> Thanks so much! > > >>>>>>>> > > >>>>>>>> Val > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Valerie A. Mallder > > >>>>>>>> > > >>>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins > > >>>>>>>> University/Applied Physics Laboratory > > >>>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > > >>>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry) > > >>>> > > >>>> --- > > >>>> Rishi Verma > > >>>> NASA Jet Propulsion Laboratory > > >>>> California Institute of Technology > > >> > > >> --- > > >> Rishi Verma > > >> NASA Jet Propulsion Laboratory > > >> California Institute of Technology > > > > > > > > > -- > > > *Lewis* > > > > > > -- > > Sent from a Tin Can attached to a String
