Hi Val,

I don¹t think you need to run a CAS-PGE task to call
crawler_launcher. If you define blocks in the <output>..</output>
section of the XML file, a crawler will be forked in the job working
directory of CAS-PGE and crawl your specified output.

I believe that will accomplish the same goal of what you are looking for.

No need to have crawling be a separate task from CAS-PGE - CAS-PGE will
do the crawling for you! :)

Cheers,
Chris

------------------------
Chris Mattmann
[email protected]




-----Original Message-----
From: "Verma, Rishi (398J)" <[email protected]>
Reply-To: <[email protected]>
Date: Thursday, October 9, 2014 at 2:44 AM
To: "[email protected]" <[email protected]>
Subject: Re: what is batch stub? Is it necessary?

>Hi Val,
>
>Yep - here¹s a link to the tasks.xml file:
>https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/work
>flow/src/main/resources/policy/tasks.xml
>
>> The problem is that the ExternScriptTaskInstance is unable to recognize
>>the command line arguments that I want to pass to the crawler_launcher
>>script. 
>
>
>Hmm.. could you share your workflow manager log, or better yet, the
>batch_stub output? Curious to see what error is thrown.
>
>Is a script file being generated for your PGE? For example, inside your
>[PGE_HOME] directory, and within the particular job directory created for
>your execution of a workflow, you will see some files starting with
>³sciPgeExeScript_в. You¹ll find one for your pgeConfig, and you can
>check to see what the PGE commands actually translate into, with respect
>to a shell script format. If that file is there, take a look at it, and
>validate whether the command works within the script (i.e. copy/paste and
>run the crawler command manually).
>
>Another suggestion is to take a step back, and build up slowly, i.e.:
>1. Do an ³echo² command within your PGE first. (e.g. <cmd> echo ³Hello
>APL.² > /tmp/test.txt</cmd>)
>2. If above works, do a crawler_launcher empty command(e.g.
><cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the
>batch_stub or Workflow Manager prints some kind of output when you run
>the workflow.
>3. Build up your crawler_launcher command piece by piece to see where it
>is failing
>
>Thanks,
>Rishi
>
>On Oct 8, 2014, at 4:24 PM, Mallder, Valerie <[email protected]>
>wrote:
>
>> Hi Rishi,
>> 
>> Thank you very much for pointing me to your working example. This is
>>very helpful.  My pgeConfig looks very similar to yours.  So, I
>>commented out the resource manager like you suggested and tried running
>>again without the resource manager. And my problem still exists. The
>>problem is that the ExternScriptTaskInstance is unable to recognize the
>>command line arguments that I want to pass to the crawler_launcher
>>script. Could you send me a link to your tasks.xml file? I'm curious as
>>to how you defined your task.  My pgeConfig and tasks.xml are below.
>> 
>> Thanks!
>> Val
>> 
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <pgeConfig>
>> 
>>   <!-- How to run the PGE -->
>>   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
>>        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
>>--launchAutoCrawler \
>>        --filemgrUrl [FILEMGR_URL] \
>>        --clientTransferer
>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>        --productPath [JobInputDir] \
>>        --mimeExtractorRepo
>>[OODT_HOME]/extensions/policy/mime-extractor-map.xml \
>>        --actionIds MoveFileToLevel0Dir</cmd>
>>   </exe>
>> 
>>   <!-- Files to ingest -->
>>   <output/>
>>   </output>
>> 
>> <!-- Custom metadata to add to output files -->
>>   <customMetadata>
>>      <metadata key="JobDir" val="[OODT_HOME]"/>
>>      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
>>      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
>>      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
>>   </customMetadata>
>> 
>> </pgeConfig>
>> 
>> 
>> 
>> <!-- tasks.xml **************************************************-->
>> 
>> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas";>
>> 
>>   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
>>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
>>      <conditions/>  <!-- There are no pre execution conditions right
>>now -->
>>      <configuration>
>> 
>>          <property name="ShellType" value="/bin/sh" />
>>          <property name="PathToScript"
>>value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
>> 
>>          <property name="PGETask_Name" value="crawler_launcher PGE
>>Task"/>
>>          <property name="PGETask_ConfigFilePath"
>>value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
>>envReplace="true" />
>>      </configuration>
>>   </task>
>> 
>> </cas:tasks>
>> 
>> Valerie A. Mallder
>> New Horizons Deputy Mission System Engineer
>> Johns Hopkins University/Applied Physics Laboratory
>> 
>> 
>>> -----Original Message-----
>>> From: Verma, Rishi (398J) [mailto:[email protected]]
>>> Sent: Wednesday, October 08, 2014 6:01 PM
>>> To: [email protected]
>>> Subject: Re: what is batch stub? Is it necessary?
>>> 
>>> Hi Valerie,
>>> 
>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow task
>>>>>>> in the CAS PGE environment.
>>> 
>>> Interesting. I have a working example here [1] you can look at that
>>>does this exact
>>> thing.
>>> 
>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
>>>>>>> what it is, why it is necessary, and how to run it (please provide
>>>>>>> exact syntax to put in my startup shell script, because I would
>>>>>>> never be able to figure it out for myself and I don't want to have
>>>>>>> to bother everyone again.)
>>> 
>>> Batchstub is only necessary if your Workflow Manger is sending jobs to
>>>Resource
>>> Manager for execution (where the default execution is to run the job
>>>in something
>>> called a ?batch stub? executable). Think of batch stubs as a small
>>>wrapper
>>> program that takes a bundle of executable instructions from Resource
>>>Manager,
>>> and executes them in a shell environment within a given remote (or
>>>local) machine.
>>> 
>>> Here?s my suggestion:
>>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
>>> following command (it?ll start a batch stub in a terminal on port
>>>2001):
>>>> ./batch_stub 2001
>>> 
>>> If the above step doesn?t fix your problem, you can also try having
>>>Workflow
>>> Manager NOT send jobs to Resource Manager for execution, and instead
>>>execute
>>> jobs locally through Workflow Manager itself (on localhost only!). To
>>>disable job
>>> transfer to Resource Manger, you?ll need to modify the Workflow Manager
>>> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and
>>>specifically
>>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
>>>line.
>>> I?ve done this in my example code below, see [2] for an exact example
>>>of this.
>>> After modifying workflow.properties, make sure to restart workflow
>>>manager
>>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by $OODT_HOME/wmgr/bin/wmgr
>>> start).
>>> 
>>> Thanks,
>>> Rishi
>>> 
>>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>>> 
>>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample
>>>.xml
>>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>>> netscan/workflow/src/main/resources/etc/workflow.properties
>>> 
>>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
>>> <[email protected]> wrote:
>>> 
>>>> Valerie,
>>>> 
>>>> I would have thought it would have just not used a batch stub by
>>>>default. That
>>> said if you go into the $OODT_HOME/resmgr/bin there should be a script
>>>to start a
>>> batch stub. Right now on my phone I forget the name of the script but
>>>if you more
>>> the file you will see the Java class name that corresponds to below.
>>>You should
>>> specify a port when you run the script which from the looks of the
>>>output below
>>> should be 2001.
>>>> 
>>>> HTH,
>>>> Paul R
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie
>>>>><[email protected]>
>>> wrote:
>>>>> 
>>>>> Well then, I'm proud to be a member :)  (I think .... )
>>>>> 
>>>>> 
>>>>> Valerie A. Mallder
>>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
>>>>> University/Applied Physics Laboratory
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Bruce Barkstrom [mailto:[email protected]]
>>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
>>>>>> To: [email protected]
>>>>>> Subject: Re: what is batch stub? Is it necessary?
>>>>>> 
>>>>>> You have every right to bother everyone.
>>>>>> You won't get what you need unless you do.
>>>>>> 
>>>>>> You get one honorary membership in the Society of General Agitators
>>>>>> - at the rank of Major Agitator.
>>>>>> 
>>>>>> Bruce B.
>>>>>> 
>>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
>>>>>> <[email protected]
>>>>>>> wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am still having trouble getting my CAS PGE crawler task to run
>>>>>>> due to
>>>>>>> http://localhost:2001 being "down". I have spent the last 2 days
>>>>>>> tracing through the resource manager code and tracked this down to
>>>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing to
>>>>>>> execute the task remotely, because on line 75 of
>>>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on its
>>>>>>> line 74) is trying to call "isAlive" on the webservice named
>>>>>>> "batchstub" which, to my knowledge, is not running because I have
>>>>>>>not done
>>> anything explicitly to run it.
>>>>>>> 
>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow task
>>>>>>> in the CAS PGE environment.  I had it running perfectly before I
>>>>>>> started trying to make it run as part of a workflow.  I really miss
>>>>>>> my crawler and really want it to run again L
>>>>>>> 
>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
>>>>>>> what it is, why it is necessary, and how to run it (please provide
>>>>>>> exact syntax to put in my startup shell script, because I would
>>>>>>> never be able to figure it out for myself and I don't want to have
>>>>>>> to bother everyone again.)
>>>>>>> 
>>>>>>> Thanks so much!
>>>>>>> 
>>>>>>> Val
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Valerie A. Mallder
>>>>>>> 
>>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
>>>>>>> University/Applied Physics Laboratory
>>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
>>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
>>>>>>> 
>>>>>>> 
>>> 
>>> ---
>>> Rishi Verma
>>> NASA Jet Propulsion Laboratory
>>> California Institute of Technology
>> 
>
>---
>Rishi Verma
>NASA Jet Propulsion Laboratory
>California Institute of Technology
>


Reply via email to