Thanks Chris, The CAS-PGE is pretty complex, I've read the documentation and it is still way over my head. Is there any documentation or examples for how to integrate the crawler into it? For instance, can I still use the crawler_launcher script? Will the ExternMetExtractor and a postIngestSuccess ExternAction script work that I created to work with the crawler still work "as is" in the CAS-PGE ? Or, should I invoke them differently? What about the Metadata that I extracted with the crawler? Do I have to redefine the metadata elements in another configuration file or policy file? If there is any documentation on doing this please point me to the right place because I didn't see anything that addressed these kinds of questions.
Thanks, Val Do I have to define these any differently in the PGE configuration Valerie A. Mallder New Horizons Deputy Mission System Engineer Johns Hopkins University/Applied Physics Laboratory > -----Original Message----- > From: Chris Mattmann [mailto:[email protected]] > Sent: Tuesday, October 07, 2014 8:16 AM > To: [email protected] > Subject: Re: how to pass arguments to workflow task that is external script > > Hi Val, > > Thanks for the detailed report. My suggestion would be to use CAS-PGE directly > instead of ExternScriptTaskInstance. That application is not well maintained, > doesn?t produce a log, etc, etc, all of the things you?ve noted. > > CAS-PGE on the other hand, will (a) prepare input for your task; (b) describe > how > to run your task (even as a script and will generate a script); and (c) will > run met > extractors and fork a crawler in your job directory in the end. > > I think it?s what you?re looking for and it?s way more well documented on the > wiki. > > Please check it out and let me know what you think. > > Cheers, > Chris > > ------------------------ > Chris Mattmann > [email protected] > > > > > -----Original Message----- > From: "Mallder, Valerie" <[email protected]> > Reply-To: <[email protected]> > Date: Monday, October 6, 2014 at 11:53 PM > To: "[email protected]" <[email protected]> > Subject: how to pass arguments to workflow task that is external script > > >Hello, > > > >I'm stuck again L This time I'm stuck trying to start my crawler as a > >task using the workflow manager. I am not using a PGE task right now. > >I'm just trying to do something simple with the workflow manager, > >filemgr, and crawler. I have read all of the documentation that is > >available on the workflow manager and have tried to piece together a > >setup based on the examples, but, things seem to be working differently > >now and the documentation hasn't caught up, which is totally > >understandable and not a criticism. Just want you to know that I try > >to do my due diligence before bothering anyone for help. > > > >I am not running the resource manager, and I have commented out setting > >the resource manager url in the workflow.properties file so that > >workflow manager will execute the job locally. > > > >I am sending workflow manager an event (via the command line using > >wmgr-client) called "startJediPipeline". Workflow manager receives the > >event, and retrieves my workflow from the repository and tries to > >execute the first (and only) task, and then it crashes. My task is an > >external script (the crawler_launcher script) and I need to pass > >several arguments to it. I've spent all day trying to figure out how to > >pass arguments to the and ExternScriptTaskInstance, but there are no > >examples of doing this, so I had to wing it. I tried putting the > >arguments in the task configuration properties. That didn't work. So I > >tried putting the arguments in the metadata properties, and that hasn't > >worked. So, your suggestions are welcome! Thanks so much. Here's the > >error log, And contents of my tasks.xml file follow it at the end. > > > >Workflow Manager started PID file > >(/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workflow/ > >run > >/cas.workflow.pid). > >Starting OODT File Manager [ Successful ] Starting OODT Resource > >Manager [ Failed ] Starting OODT Workflow Manager [ Successful ] > >slothrop:{~/project/jedi/users/jedi-pipeline/oodt-deploy/bin} Oct 06, > >2014 5:48:30 PM > >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager > >loadProperties > >INFO: Loading Workflow Manager Configuration Properties from: > >[/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workflow/ > >etc > >/workflow.properties] > >Oct 06, 2014 5:48:30 PM > >org.apache.oodt.cas.workflow.engine.ThreadPoolWorkflowEngineFactory > >getResmgrUrl > >INFO: No Resource Manager URL provided or malformed URL: executing jobs > >locally. URL: [null] Oct 06, 2014 5:48:30 PM > >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager <init> > >INFO: Workflow Manager started by malldva1 Oct 06, 2014 5:48:41 PM > >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent > >INFO: WorkflowManager: Received event: startJediPipeline Oct 06, 2014 > >5:48:41 PM org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager > >handleEvent > >INFO: WorkflowManager: Workflow Jedi Pipeline Workflow retrieved for > >event startJediPipeline Oct 06, 2014 5:48:41 PM > >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >checkTaskRequiredMetadata > >INFO: Task: [Crawler Task] has no required metadata fields Oct 06, 2014 > >5:48:42 PM > >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >executeTaskLocally > >INFO: Executing task: [Crawler Task] locally > >java.lang.NullPointerException > > at > >org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance.run(Exte > >rnS > >criptTaskInstance.java:72) > > at > >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.ex > >ecu > >teTaskLocally(IterativeWorkflowProcessorThread.java:574) > > at > >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.ru > >n(I > >terativeWorkflowProcessorThread.java:321) > > at > >EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) > > at java.lang.Thread.run(Thread.java:745) > >Oct 06, 2014 5:48:42 PM > >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >executeTaskLocally > >WARNING: Exception executing task: [Crawler Task] locally: Message: > >null > > > > > > > > > ><cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > ><!-- > > TODO: Add some examples > >--> > > > > <task id="urn:oodt:crawlerTask" name="Crawler Task" > >class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"/> > > <conditions/> <!-- There are no pre execution conditions right > >now > >--> > > <configuration> > > <property name="ShellType" value="/bin/sh" /> > > <property name="PathToScript" > >value="[OODT_HOME]/crawler/bin/crawler_launcher"/> > > </configuration> > > <metadata> > > <args> > > <arg>--operation</arg> > > <arg>--launchAutoCrawler</arg> > > <arg>--productPath</arg> > > <arg>[OODT_HOME]/data/staging</arg> > > <arg>--filemgrUrl</arg> > > <arg>http://localhost:9000</arg> > > <arg>--clientTransferer</arg> > > > ><arg>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory< > >/ar > >g> > > <arg>--mimeExtractorRepo</arg> > > > ><arg>[$OODT_HOME]/extensions/policy/mime-extractor-map.xml</arg> > > <arg>--actionIds</arg> > > <arg>MoveFileToLevel0Dir</arg> > > </args> > > </metadata> > ></cas:tasks> > > > > > >Valerie A. Mallder > > > >New Horizons Deputy Mission System Engineer The Johns Hopkins > >University/Applied Physics Laboratory > >11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > >240-228-7846 (Office) 410-504-2233 (Blackberry) > > >
