Hi Val,

Yep - here’s a link to the tasks.xml file: 
https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/workflow/src/main/resources/policy/tasks.xml

> The problem is that the ExternScriptTaskInstance is unable to recognize the 
> command line arguments that I want to pass to the crawler_launcher script. 


Hmm.. could you share your workflow manager log, or better yet, the batch_stub 
output? Curious to see what error is thrown.

Is a script file being generated for your PGE? For example, inside your 
[PGE_HOME] directory, and within the particular job directory created for your 
execution of a workflow, you will see some files starting with 
“sciPgeExeScript_…”. You’ll find one for your pgeConfig, and you can check to 
see what the PGE commands actually translate into, with respect to a shell 
script format. If that file is there, take a look at it, and validate whether 
the command works within the script (i.e. copy/paste and run the crawler 
command manually).

Another suggestion is to take a step back, and build up slowly, i.e.:
1. Do an “echo” command within your PGE first. (e.g. <cmd> echo “Hello APL.” > 
/tmp/test.txt</cmd>)
2. If above works, do a crawler_launcher empty command(e.g. 
<cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the 
batch_stub or Workflow Manager prints some kind of output when you run the 
workflow.
3. Build up your crawler_launcher command piece by piece to see where it is 
failing

Thanks,
Rishi

On Oct 8, 2014, at 4:24 PM, Mallder, Valerie <[email protected]> wrote:

> Hi Rishi,
> 
> Thank you very much for pointing me to your working example. This is very 
> helpful.  My pgeConfig looks very similar to yours.  So, I commented out the 
> resource manager like you suggested and tried running again without the 
> resource manager. And my problem still exists. The problem is that the 
> ExternScriptTaskInstance is unable to recognize the command line arguments 
> that I want to pass to the crawler_launcher script. Could you send me a link 
> to your tasks.xml file? I'm curious as to how you defined your task.  My 
> pgeConfig and tasks.xml are below.
> 
> Thanks!
> Val
> 
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <pgeConfig>
> 
>   <!-- How to run the PGE -->
>   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
>        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation 
> --launchAutoCrawler \
>        --filemgrUrl [FILEMGR_URL] \
>        --clientTransferer 
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>        --productPath [JobInputDir] \
>        --mimeExtractorRepo 
> [OODT_HOME]/extensions/policy/mime-extractor-map.xml \
>        --actionIds MoveFileToLevel0Dir</cmd>
>   </exe>
> 
>   <!-- Files to ingest -->
>   <output/>
>   </output>
> 
> <!-- Custom metadata to add to output files -->
>   <customMetadata>
>      <metadata key="JobDir" val="[OODT_HOME]"/>
>      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
>      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
>      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
>   </customMetadata>
> 
> </pgeConfig>
> 
> 
> 
> <!-- tasks.xml **************************************************-->
> 
> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas";>
> 
>   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" 
> class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
>      <conditions/>  <!-- There are no pre execution conditions right now -->
>      <configuration>
> 
>          <property name="ShellType" value="/bin/sh" />
>          <property name="PathToScript" 
> value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
> 
>          <property name="PGETask_Name" value="crawler_launcher PGE Task"/>
>          <property name="PGETask_ConfigFilePath" 
> value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" 
> envReplace="true" />
>      </configuration>
>   </task>
> 
> </cas:tasks>
> 
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
> 
> 
>> -----Original Message-----
>> From: Verma, Rishi (398J) [mailto:[email protected]]
>> Sent: Wednesday, October 08, 2014 6:01 PM
>> To: [email protected]
>> Subject: Re: what is batch stub? Is it necessary?
>> 
>> Hi Valerie,
>> 
>>>>>> All I am trying to do is run "crawler_launcher" as a workflow task
>>>>>> in the CAS PGE environment.
>> 
>> Interesting. I have a working example here [1] you can look at that does 
>> this exact
>> thing.
>> 
>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
>>>>>> what it is, why it is necessary, and how to run it (please provide
>>>>>> exact syntax to put in my startup shell script, because I would
>>>>>> never be able to figure it out for myself and I don't want to have
>>>>>> to bother everyone again.)
>> 
>> Batchstub is only necessary if your Workflow Manger is sending jobs to 
>> Resource
>> Manager for execution (where the default execution is to run the job in 
>> something
>> called a ?batch stub? executable). Think of batch stubs as a small wrapper
>> program that takes a bundle of executable instructions from Resource Manager,
>> and executes them in a shell environment within a given remote (or local) 
>> machine.
>> 
>> Here?s my suggestion:
>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
>> following command (it?ll start a batch stub in a terminal on port 2001):
>>> ./batch_stub 2001
>> 
>> If the above step doesn?t fix your problem, you can also try having Workflow
>> Manager NOT send jobs to Resource Manager for execution, and instead execute
>> jobs locally through Workflow Manager itself (on localhost only!). To 
>> disable job
>> transfer to Resource Manger, you?ll need to modify the Workflow Manager
>> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and specifically
>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? line.
>> I?ve done this in my example code below, see [2] for an exact example of 
>> this.
>> After modifying workflow.properties, make sure to restart workflow manager
>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by $OODT_HOME/wmgr/bin/wmgr
>> start).
>> 
>> Thanks,
>> Rishi
>> 
>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>> netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample.xml
>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>> netscan/workflow/src/main/resources/etc/workflow.properties
>> 
>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
>> <[email protected]> wrote:
>> 
>>> Valerie,
>>> 
>>> I would have thought it would have just not used a batch stub by default. 
>>> That
>> said if you go into the $OODT_HOME/resmgr/bin there should be a script to 
>> start a
>> batch stub. Right now on my phone I forget the name of the script but if you 
>> more
>> the file you will see the Java class name that corresponds to below. You 
>> should
>> specify a port when you run the script which from the looks of the output 
>> below
>> should be 2001.
>>> 
>>> HTH,
>>> Paul R
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie <[email protected]>
>> wrote:
>>>> 
>>>> Well then, I'm proud to be a member :)  (I think .... )
>>>> 
>>>> 
>>>> Valerie A. Mallder
>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
>>>> University/Applied Physics Laboratory
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Bruce Barkstrom [mailto:[email protected]]
>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
>>>>> To: [email protected]
>>>>> Subject: Re: what is batch stub? Is it necessary?
>>>>> 
>>>>> You have every right to bother everyone.
>>>>> You won't get what you need unless you do.
>>>>> 
>>>>> You get one honorary membership in the Society of General Agitators
>>>>> - at the rank of Major Agitator.
>>>>> 
>>>>> Bruce B.
>>>>> 
>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
>>>>> <[email protected]
>>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I am still having trouble getting my CAS PGE crawler task to run
>>>>>> due to
>>>>>> http://localhost:2001 being "down". I have spent the last 2 days
>>>>>> tracing through the resource manager code and tracked this down to
>>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing to
>>>>>> execute the task remotely, because on line 75 of
>>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on its
>>>>>> line 74) is trying to call "isAlive" on the webservice named
>>>>>> "batchstub" which, to my knowledge, is not running because I have not 
>>>>>> done
>> anything explicitly to run it.
>>>>>> 
>>>>>> All I am trying to do is run "crawler_launcher" as a workflow task
>>>>>> in the CAS PGE environment.  I had it running perfectly before I
>>>>>> started trying to make it run as part of a workflow.  I really miss
>>>>>> my crawler and really want it to run again L
>>>>>> 
>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
>>>>>> what it is, why it is necessary, and how to run it (please provide
>>>>>> exact syntax to put in my startup shell script, because I would
>>>>>> never be able to figure it out for myself and I don't want to have
>>>>>> to bother everyone again.)
>>>>>> 
>>>>>> Thanks so much!
>>>>>> 
>>>>>> Val
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Valerie A. Mallder
>>>>>> 
>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
>>>>>> University/Applied Physics Laboratory
>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
>>>>>> 
>>>>>> 
>> 
>> ---
>> Rishi Verma
>> NASA Jet Propulsion Laboratory
>> California Institute of Technology
> 

---
Rishi Verma
NASA Jet Propulsion Laboratory
California Institute of Technology

Reply via email to