What are XSD's ? And, what do you mean by "restrictions"? Do you mean 'definitions' of what can be included within the task and configuration xml files? If so, then I a totally agree with you.
Valerie A. Mallder New Horizons Deputy Mission System Engineer Johns Hopkins University/Applied Physics Laboratory > -----Original Message----- > From: Ramirez, Paul M (398J) [mailto:[email protected]] > Sent: Wednesday, October 08, 2014 10:38 PM > To: <[email protected]> > Subject: Re: what is batch stub? Is it necessary? > > +1 billion > > --Paul > > Sent from my iPhone > > > On Oct 8, 2014, at 5:55 PM, Lewis John Mcgibbney > <[email protected]> wrote: > > > > Folks, > > Is it possible to create a parent issue for defining XSD's for all of > > the XML file we need ti OODT? > > I do not know them all, but from this thread alone, it is clear that > > we could do with setting some kind of restrictions on what can be > > included within task and configuration XML within OODT. > > Thoughts? > > Lewis > > > > On Wed, Oct 8, 2014 at 5:44 PM, Verma, Rishi (398J) < > > [email protected]> wrote: > > > >> Hi Val, > >> > >> Yep - here?s a link to the tasks.xml file: > >> > >> https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan > >> /workflow/src/main/resources/policy/tasks.xml > >> > >>> The problem is that the ExternScriptTaskInstance is unable to > >>> recognize > >> the command line arguments that I want to pass to the > >> crawler_launcher script. > >> > >> > >> Hmm.. could you share your workflow manager log, or better yet, the > >> batch_stub output? Curious to see what error is thrown. > >> > >> Is a script file being generated for your PGE? For example, inside > >> your [PGE_HOME] directory, and within the particular job directory > >> created for your execution of a workflow, you will see some files > >> starting with ?sciPgeExeScript_??. You?ll find one for your > >> pgeConfig, and you can check to see what the PGE commands actually > >> translate into, with respect to a shell script format. If that file > >> is there, take a look at it, and validate whether the command works > >> within the script (i.e. copy/paste and run the crawler command manually). > >> > >> Another suggestion is to take a step back, and build up slowly, i.e.: > >> 1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo > >> ?Hello APL.? > /tmp/test.txt</cmd>) 2. If above works, do a > >> crawler_launcher empty command(e.g. > >> <cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the > >> batch_stub or Workflow Manager prints some kind of output when you > >> run the workflow. > >> 3. Build up your crawler_launcher command piece by piece to see where > >> it is failing > >> > >> Thanks, > >> Rishi > >> > >> On Oct 8, 2014, at 4:24 PM, Mallder, Valerie > >> <[email protected]> > >> wrote: > >> > >>> Hi Rishi, > >>> > >>> Thank you very much for pointing me to your working example. This is > >> very helpful. My pgeConfig looks very similar to yours. So, I > >> commented out the resource manager like you suggested and tried > >> running again without the resource manager. And my problem still > >> exists. The problem is that the ExternScriptTaskInstance is unable to > >> recognize the command line arguments that I want to pass to the > >> crawler_launcher script. Could you send me a link to your tasks.xml file? > >> I'm > curious as to how you defined your task. > >> My pgeConfig and tasks.xml are below. > >>> > >>> Thanks! > >>> Val > >>> > >>> > >>> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig> > >>> > >>> <!-- How to run the PGE --> > >>> <exe dir="[JobDir]" shell="/bin/sh" envReplace="true"> > >>> <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation > >> --launchAutoCrawler \ > >>> --filemgrUrl [FILEMGR_URL] \ > >>> --clientTransferer > >> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \ > >>> --productPath [JobInputDir] \ > >>> --mimeExtractorRepo > >> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \ > >>> --actionIds MoveFileToLevel0Dir</cmd> </exe> > >>> > >>> <!-- Files to ingest --> > >>> <output/> > >>> </output> > >>> > >>> <!-- Custom metadata to add to output files --> <customMetadata> > >>> <metadata key="JobDir" val="[OODT_HOME]"/> > >>> <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/> > >>> <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/> > >>> <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/> > >>> </customMetadata> > >>> > >>> </pgeConfig> > >>> > >>> > >>> > >>> <!-- tasks.xml **************************************************--> > >>> > >>> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > >>> > >>> <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" > >> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance > >> "> > >>> <conditions/> <!-- There are no pre execution conditions right > >>> now > >> --> > >>> <configuration> > >>> > >>> <property name="ShellType" value="/bin/sh" /> > >>> <property name="PathToScript" > >> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" /> > >>> > >>> <property name="PGETask_Name" value="crawler_launcher PGE > >> Task"/> > >>> <property name="PGETask_ConfigFilePath" > >> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" > >> envReplace="true" /> > >>> </configuration> > >>> </task> > >>> > >>> </cas:tasks> > >>> > >>> Valerie A. Mallder > >>> New Horizons Deputy Mission System Engineer Johns Hopkins > >>> University/Applied Physics Laboratory > >>> > >>> > >>>> -----Original Message----- > >>>> From: Verma, Rishi (398J) [mailto:[email protected]] > >>>> Sent: Wednesday, October 08, 2014 6:01 PM > >>>> To: [email protected] > >>>> Subject: Re: what is batch stub? Is it necessary? > >>>> > >>>> Hi Valerie, > >>>> > >>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow > >>>>>>>> task in the CAS PGE environment. > >>>> > >>>> Interesting. I have a working example here [1] you can look at that > >> does this exact > >>>> thing. > >>>> > >>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell > >>>>>>>> me what it is, why it is necessary, and how to run it (please > >>>>>>>> provide exact syntax to put in my startup shell script, because > >>>>>>>> I would never be able to figure it out for myself and I don't > >>>>>>>> want to have to bother everyone again.) > >>>> > >>>> Batchstub is only necessary if your Workflow Manger is sending jobs > >>>> to > >> Resource > >>>> Manager for execution (where the default execution is to run the > >>>> job in > >> something > >>>> called a ?batch stub? executable). Think of batch stubs as a small > >> wrapper > >>>> program that takes a bundle of executable instructions from > >>>> Resource > >> Manager, > >>>> and executes them in a shell environment within a given remote (or > >> local) machine. > >>>> > >>>> Here?s my suggestion: > >>>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute > >>>> the following command (it?ll start a batch stub in a terminal on port > >>>> 2001): > >>>>> ./batch_stub 2001 > >>>> > >>>> If the above step doesn?t fix your problem, you can also try having > >> Workflow > >>>> Manager NOT send jobs to Resource Manager for execution, and > >>>> instead > >> execute > >>>> jobs locally through Workflow Manager itself (on localhost only!). > >>>> To > >> disable job > >>>> transfer to Resource Manger, you?ll need to modify the Workflow > >>>> Manager properties file ($OODT_HOME/wmgr/etc/workflow.properties), > >>>> and > >> specifically > >>>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? > >> line. > >>>> I?ve done this in my example code below, see [2] for an exact > >>>> example > >> of this. > >>>> After modifying workflow.properties, make sure to restart workflow > >> manager > >>>> ($OODT_HOME/wmgr/bin/wmgr stop followed by > $OODT_HOME/wmgr/bin/wmgr > >>>> start). > >>>> > >>>> Thanks, > >>>> Rishi > >>>> > >>>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- > >> netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsam > >> ple.xml > >>>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- > >>>> netscan/workflow/src/main/resources/etc/workflow.properties > >>>> > >>>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J) > >>>> <[email protected]> wrote: > >>>> > >>>>> Valerie, > >>>>> > >>>>> I would have thought it would have just not used a batch stub by > >> default. That > >>>> said if you go into the $OODT_HOME/resmgr/bin there should be a > >>>> script > >> to start a > >>>> batch stub. Right now on my phone I forget the name of the script > >>>> but > >> if you more > >>>> the file you will see the Java class name that corresponds to below. > >> You should > >>>> specify a port when you run the script which from the looks of the > >> output below > >>>> should be 2001. > >>>>> > >>>>> HTH, > >>>>> Paul R > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie < > >> [email protected]> > >>>> wrote: > >>>>>> > >>>>>> Well then, I'm proud to be a member :) (I think .... ) > >>>>>> > >>>>>> > >>>>>> Valerie A. Mallder > >>>>>> New Horizons Deputy Mission System Engineer Johns Hopkins > >>>>>> University/Applied Physics Laboratory > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Bruce Barkstrom [mailto:[email protected]] > >>>>>>> Sent: Wednesday, October 08, 2014 4:54 PM > >>>>>>> To: [email protected] > >>>>>>> Subject: Re: what is batch stub? Is it necessary? > >>>>>>> > >>>>>>> You have every right to bother everyone. > >>>>>>> You won't get what you need unless you do. > >>>>>>> > >>>>>>> You get one honorary membership in the Society of General > >>>>>>> Agitators > >>>>>>> - at the rank of Major Agitator. > >>>>>>> > >>>>>>> Bruce B. > >>>>>>> > >>>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie > >>>>>>> <[email protected] > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> I am still having trouble getting my CAS PGE crawler task to > >>>>>>>> run due to > >>>>>>>> http://localhost:2001 being "down". I have spent the last 2 > >>>>>>>> days tracing through the resource manager code and tracked this > >>>>>>>> down to line 146 of LRUScheduler where the XmlRpcBatchMgr is > >>>>>>>> failing to execute the task remotely, because on line 75 of > >>>>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on > >>>>>>>> its line 74) is trying to call "isAlive" on the webservice > >>>>>>>> named "batchstub" which, to my knowledge, is not running > >>>>>>>> because I have > >> not done > >>>> anything explicitly to run it. > >>>>>>>> > >>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow > >>>>>>>> task in the CAS PGE environment. I had it running perfectly > >>>>>>>> before I started trying to make it run as part of a workflow. > >>>>>>>> I really miss my crawler and really want it to run again L > >>>>>>>> > >>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell > >>>>>>>> me what it is, why it is necessary, and how to run it (please > >>>>>>>> provide exact syntax to put in my startup shell script, because > >>>>>>>> I would never be able to figure it out for myself and I don't > >>>>>>>> want to have to bother everyone again.) > >>>>>>>> > >>>>>>>> Thanks so much! > >>>>>>>> > >>>>>>>> Val > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Valerie A. Mallder > >>>>>>>> > >>>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins > >>>>>>>> University/Applied Physics Laboratory > >>>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > >>>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry) > >>>> > >>>> --- > >>>> Rishi Verma > >>>> NASA Jet Propulsion Laboratory > >>>> California Institute of Technology > >> > >> --- > >> Rishi Verma > >> NASA Jet Propulsion Laboratory > >> California Institute of Technology > > > > > > -- > > *Lewis*
