ack, git push ------------------------ Chris Mattmann [email protected]
-----Original Message----- From: "Ramirez, Paul M (398J)" <[email protected]> Reply-To: <[email protected]> Date: Thursday, October 9, 2014 at 4:37 AM To: "<[email protected]>" <[email protected]> Subject: Re: what is batch stub? Is it necessary? >+1 billion > >--Paul > >Sent from my iPhone > >> On Oct 8, 2014, at 5:55 PM, Lewis John Mcgibbney >><[email protected]> wrote: >> >> Folks, >> Is it possible to create a parent issue for defining XSD's for all of >>the >> XML file we need ti OODT? >> I do not know them all, but from this thread alone, it is clear that we >> could do with setting some kind of restrictions on what can be included >> within task and configuration XML within OODT. >> Thoughts? >> Lewis >> >> On Wed, Oct 8, 2014 at 5:44 PM, Verma, Rishi (398J) < >> [email protected]> wrote: >> >>> Hi Val, >>> >>> Yep - here¹s a link to the tasks.xml file: >>> >>> >>>https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/wo >>>rkflow/src/main/resources/policy/tasks.xml >>> >>>> The problem is that the ExternScriptTaskInstance is unable to >>>>recognize >>> the command line arguments that I want to pass to the crawler_launcher >>> script. >>> >>> >>> Hmm.. could you share your workflow manager log, or better yet, the >>> batch_stub output? Curious to see what error is thrown. >>> >>> Is a script file being generated for your PGE? For example, inside your >>> [PGE_HOME] directory, and within the particular job directory created >>>for >>> your execution of a workflow, you will see some files starting with >>> ³sciPgeExeScript_². You¹ll find one for your pgeConfig, and you can >>>check >>> to see what the PGE commands actually translate into, with respect to a >>> shell script format. If that file is there, take a look at it, and >>>validate >>> whether the command works within the script (i.e. copy/paste and run >>>the >>> crawler command manually). >>> >>> Another suggestion is to take a step back, and build up slowly, i.e.: >>> 1. Do an ³echo² command within your PGE first. (e.g. <cmd> echo ³Hello >>> APL.² > /tmp/test.txt</cmd>) >>> 2. If above works, do a crawler_launcher empty command(e.g. >>> <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the >>> batch_stub or Workflow Manager prints some kind of output when you run >>>the >>> workflow. >>> 3. Build up your crawler_launcher command piece by piece to see where >>>it >>> is failing >>> >>> Thanks, >>> Rishi >>> >>> On Oct 8, 2014, at 4:24 PM, Mallder, Valerie >>><[email protected]> >>> wrote: >>> >>>> Hi Rishi, >>>> >>>> Thank you very much for pointing me to your working example. This is >>> very helpful. My pgeConfig looks very similar to yours. So, I >>>commented >>> out the resource manager like you suggested and tried running again >>>without >>> the resource manager. And my problem still exists. The problem is that >>>the >>> ExternScriptTaskInstance is unable to recognize the command line >>>arguments >>> that I want to pass to the crawler_launcher script. Could you send me a >>> link to your tasks.xml file? I'm curious as to how you defined your >>>task. >>> My pgeConfig and tasks.xml are below. >>>> >>>> Thanks! >>>> Val >>>> >>>> >>>> <?xml version="1.0" encoding="UTF-8"?> >>>> <pgeConfig> >>>> >>>> <!-- How to run the PGE --> >>>> <exe dir="[JobDir]" shell="/bin/sh" envReplace="true"> >>>> <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation >>> --launchAutoCrawler \ >>>> --filemgrUrl [FILEMGR_URL] \ >>>> --clientTransferer >>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \ >>>> --productPath [JobInputDir] \ >>>> --mimeExtractorRepo >>> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \ >>>> --actionIds MoveFileToLevel0Dir</cmd> >>>> </exe> >>>> >>>> <!-- Files to ingest --> >>>> <output/> >>>> </output> >>>> >>>> <!-- Custom metadata to add to output files --> >>>> <customMetadata> >>>> <metadata key="JobDir" val="[OODT_HOME]"/> >>>> <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/> >>>> <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/> >>>> <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/> >>>> </customMetadata> >>>> >>>> </pgeConfig> >>>> >>>> >>>> >>>> <!-- tasks.xml **************************************************--> >>>> >>>> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> >>>> >>>> <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" >>> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"> >>>> <conditions/> <!-- There are no pre execution conditions right >>>>now >>> --> >>>> <configuration> >>>> >>>> <property name="ShellType" value="/bin/sh" /> >>>> <property name="PathToScript" >>> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" /> >>>> >>>> <property name="PGETask_Name" value="crawler_launcher PGE >>> Task"/> >>>> <property name="PGETask_ConfigFilePath" >>> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" >>> envReplace="true" /> >>>> </configuration> >>>> </task> >>>> >>>> </cas:tasks> >>>> >>>> Valerie A. Mallder >>>> New Horizons Deputy Mission System Engineer >>>> Johns Hopkins University/Applied Physics Laboratory >>>> >>>> >>>>> -----Original Message----- >>>>> From: Verma, Rishi (398J) [mailto:[email protected]] >>>>> Sent: Wednesday, October 08, 2014 6:01 PM >>>>> To: [email protected] >>>>> Subject: Re: what is batch stub? Is it necessary? >>>>> >>>>> Hi Valerie, >>>>> >>>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow >>>>>>>>>task >>>>>>>>> in the CAS PGE environment. >>>>> >>>>> Interesting. I have a working example here [1] you can look at that >>> does this exact >>>>> thing. >>>>> >>>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me >>>>>>>>> what it is, why it is necessary, and how to run it (please >>>>>>>>>provide >>>>>>>>> exact syntax to put in my startup shell script, because I would >>>>>>>>> never be able to figure it out for myself and I don't want to >>>>>>>>>have >>>>>>>>> to bother everyone again.) >>>>> >>>>> Batchstub is only necessary if your Workflow Manger is sending jobs >>>>>to >>> Resource >>>>> Manager for execution (where the default execution is to run the job >>>>>in >>> something >>>>> called a ?batch stub? executable). Think of batch stubs as a small >>> wrapper >>>>> program that takes a bundle of executable instructions from Resource >>> Manager, >>>>> and executes them in a shell environment within a given remote (or >>> local) machine. >>>>> >>>>> Here?s my suggestion: >>>>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the >>>>> following command (it?ll start a batch stub in a terminal on port >>>>>2001): >>>>>> ./batch_stub 2001 >>>>> >>>>> If the above step doesn?t fix your problem, you can also try having >>> Workflow >>>>> Manager NOT send jobs to Resource Manager for execution, and instead >>> execute >>>>> jobs locally through Workflow Manager itself (on localhost only!). To >>> disable job >>>>> transfer to Resource Manger, you?ll need to modify the Workflow >>>>>Manager >>>>> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and >>> specifically >>>>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? >>> line. >>>>> I?ve done this in my example code below, see [2] for an exact example >>> of this. >>>>> After modifying workflow.properties, make sure to restart workflow >>> manager >>>>> ($OODT_HOME/wmgr/bin/wmgr stop followed by $OODT_HOME/wmgr/bin/wmgr >>>>> start). >>>>> >>>>> Thanks, >>>>> Rishi >>>>> >>>>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- >>> >>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample >>>.xml >>>>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- >>>>> netscan/workflow/src/main/resources/etc/workflow.properties >>>>> >>>>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J) >>>>> <[email protected]> wrote: >>>>> >>>>>> Valerie, >>>>>> >>>>>> I would have thought it would have just not used a batch stub by >>> default. That >>>>> said if you go into the $OODT_HOME/resmgr/bin there should be a >>>>>script >>> to start a >>>>> batch stub. Right now on my phone I forget the name of the script but >>> if you more >>>>> the file you will see the Java class name that corresponds to below. >>> You should >>>>> specify a port when you run the script which from the looks of the >>> output below >>>>> should be 2001. >>>>>> >>>>>> HTH, >>>>>> Paul R >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie < >>> [email protected]> >>>>> wrote: >>>>>>> >>>>>>> Well then, I'm proud to be a member :) (I think .... ) >>>>>>> >>>>>>> >>>>>>> Valerie A. Mallder >>>>>>> New Horizons Deputy Mission System Engineer Johns Hopkins >>>>>>> University/Applied Physics Laboratory >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Bruce Barkstrom [mailto:[email protected]] >>>>>>>> Sent: Wednesday, October 08, 2014 4:54 PM >>>>>>>> To: [email protected] >>>>>>>> Subject: Re: what is batch stub? Is it necessary? >>>>>>>> >>>>>>>> You have every right to bother everyone. >>>>>>>> You won't get what you need unless you do. >>>>>>>> >>>>>>>> You get one honorary membership in the Society of General >>>>>>>>Agitators >>>>>>>> - at the rank of Major Agitator. >>>>>>>> >>>>>>>> Bruce B. >>>>>>>> >>>>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie >>>>>>>> <[email protected] >>>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I am still having trouble getting my CAS PGE crawler task to run >>>>>>>>> due to >>>>>>>>> http://localhost:2001 being "down". I have spent the last 2 days >>>>>>>>> tracing through the resource manager code and tracked this down >>>>>>>>>to >>>>>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing to >>>>>>>>> execute the task remotely, because on line 75 of >>>>>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on >>>>>>>>>its >>>>>>>>> line 74) is trying to call "isAlive" on the webservice named >>>>>>>>> "batchstub" which, to my knowledge, is not running because I have >>> not done >>>>> anything explicitly to run it. >>>>>>>>> >>>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow >>>>>>>>>task >>>>>>>>> in the CAS PGE environment. I had it running perfectly before I >>>>>>>>> started trying to make it run as part of a workflow. I really >>>>>>>>>miss >>>>>>>>> my crawler and really want it to run again L >>>>>>>>> >>>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me >>>>>>>>> what it is, why it is necessary, and how to run it (please >>>>>>>>>provide >>>>>>>>> exact syntax to put in my startup shell script, because I would >>>>>>>>> never be able to figure it out for myself and I don't want to >>>>>>>>>have >>>>>>>>> to bother everyone again.) >>>>>>>>> >>>>>>>>> Thanks so much! >>>>>>>>> >>>>>>>>> Val >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Valerie A. Mallder >>>>>>>>> >>>>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins >>>>>>>>> University/Applied Physics Laboratory >>>>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 >>>>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry) >>>>> >>>>> --- >>>>> Rishi Verma >>>>> NASA Jet Propulsion Laboratory >>>>> California Institute of Technology >>> >>> --- >>> Rishi Verma >>> NASA Jet Propulsion Laboratory >>> California Institute of Technology >> >> >> -- >> *Lewis*
