Thanks Val, I agree, yes, CAS-PGE is complex. Did you see the learn by example wiki page:
https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example I think it¹s pretty basic and illustrates what CAS-PGE does. Basically the jist of it is: 1. you only need to create a PGEConfig.xml file that specifies: - how to generate input for your integrated algorithm - how to execute your algorithm (e.g., how to generate a script that executes it) - how to generate metadata from the output, and then to crawl the files + met and get the outputs into the file manager 2. you go into workflow tasks.xml, define a new CAS-PGE type task, point at this config file, and provide CAS-PGE task properties (an example is here: http://svn.apache.org/repos/asf/oodt/trunk/pge/src/main/resources/examples/ WorkflowTask/ If you want to see a basic example of CAS-PGE in action, check out DRAT: https://github.com/chrismattmann/drat/ It¹s a RADIX-based deployment with 2 CAS-PGEs (one for the MIME partition; and another for RAT). Check that out, see how DRAT works (and integrates CAS-PGE) and then let me know if you are still confused and I will be glad to help more. Cheers, Chris ------------------------ Chris Mattmann [email protected] -----Original Message----- From: "Mallder, Valerie" <[email protected]> Reply-To: <[email protected]> Date: Tuesday, October 7, 2014 at 4:56 PM To: "[email protected]" <[email protected]> Subject: RE: how to pass arguments to workflow task that is external script >Thanks Chris, > >The CAS-PGE is pretty complex, I've read the documentation and it is >still way over my head. Is there any documentation or examples for how >to integrate the crawler into it? For instance, can I still use the >crawler_launcher script? Will the ExternMetExtractor and a >postIngestSuccess ExternAction script work that I created to work with >the crawler still work "as is" in the CAS-PGE ? Or, should I invoke them >differently? What about the Metadata that I extracted with the crawler? >Do I have to redefine the metadata elements in another configuration file >or policy file? If there is any documentation on doing this please point >me to the right place because I didn't see anything that addressed these >kinds of questions. > >Thanks, >Val > >Do I have to define these any differently in the PGE configuration > > >Valerie A. Mallder >New Horizons Deputy Mission System Engineer >Johns Hopkins University/Applied Physics Laboratory > >> -----Original Message----- >> From: Chris Mattmann [mailto:[email protected]] >> Sent: Tuesday, October 07, 2014 8:16 AM >> To: [email protected] >> Subject: Re: how to pass arguments to workflow task that is external >>script >> >> Hi Val, >> >> Thanks for the detailed report. My suggestion would be to use CAS-PGE >>directly >> instead of ExternScriptTaskInstance. That application is not well >>maintained, >> doesn?t produce a log, etc, etc, all of the things you?ve noted. >> >> CAS-PGE on the other hand, will (a) prepare input for your task; (b) >>describe how >> to run your task (even as a script and will generate a script); and (c) >>will run met >> extractors and fork a crawler in your job directory in the end. >> >> I think it?s what you?re looking for and it?s way more well documented >>on the wiki. >> >> Please check it out and let me know what you think. >> >> Cheers, >> Chris >> >> ------------------------ >> Chris Mattmann >> [email protected] >> >> >> >> >> -----Original Message----- >> From: "Mallder, Valerie" <[email protected]> >> Reply-To: <[email protected]> >> Date: Monday, October 6, 2014 at 11:53 PM >> To: "[email protected]" <[email protected]> >> Subject: how to pass arguments to workflow task that is external script >> >> >Hello, >> > >> >I'm stuck again L This time I'm stuck trying to start my crawler as a >> >task using the workflow manager. I am not using a PGE task right now. >> >I'm just trying to do something simple with the workflow manager, >> >filemgr, and crawler. I have read all of the documentation that is >> >available on the workflow manager and have tried to piece together a >> >setup based on the examples, but, things seem to be working differently >> >now and the documentation hasn't caught up, which is totally >> >understandable and not a criticism. Just want you to know that I try >> >to do my due diligence before bothering anyone for help. >> > >> >I am not running the resource manager, and I have commented out setting >> >the resource manager url in the workflow.properties file so that >> >workflow manager will execute the job locally. >> > >> >I am sending workflow manager an event (via the command line using >> >wmgr-client) called "startJediPipeline". Workflow manager receives the >> >event, and retrieves my workflow from the repository and tries to >> >execute the first (and only) task, and then it crashes. My task is an >> >external script (the crawler_launcher script) and I need to pass >> >several arguments to it. I've spent all day trying to figure out how to >> >pass arguments to the and ExternScriptTaskInstance, but there are no >> >examples of doing this, so I had to wing it. I tried putting the >> >arguments in the task configuration properties. That didn't work. So I >> >tried putting the arguments in the metadata properties, and that hasn't >> >worked. So, your suggestions are welcome! Thanks so much. Here's the >> >error log, And contents of my tasks.xml file follow it at the end. >> > >> >Workflow Manager started PID file >> >(/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workflow/ >> >run >> >/cas.workflow.pid). >> >Starting OODT File Manager [ Successful ] Starting OODT Resource >> >Manager [ Failed ] Starting OODT Workflow Manager [ Successful ] >> >slothrop:{~/project/jedi/users/jedi-pipeline/oodt-deploy/bin} Oct 06, >> >2014 5:48:30 PM >> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager >> >loadProperties >> >INFO: Loading Workflow Manager Configuration Properties from: >> >[/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workflow/ >> >etc >> >/workflow.properties] >> >Oct 06, 2014 5:48:30 PM >> >org.apache.oodt.cas.workflow.engine.ThreadPoolWorkflowEngineFactory >> >getResmgrUrl >> >INFO: No Resource Manager URL provided or malformed URL: executing jobs >> >locally. URL: [null] Oct 06, 2014 5:48:30 PM >> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager <init> >> >INFO: Workflow Manager started by malldva1 Oct 06, 2014 5:48:41 PM >> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager handleEvent >> >INFO: WorkflowManager: Received event: startJediPipeline Oct 06, 2014 >> >5:48:41 PM org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager >> >handleEvent >> >INFO: WorkflowManager: Workflow Jedi Pipeline Workflow retrieved for >> >event startJediPipeline Oct 06, 2014 5:48:41 PM >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread >> >checkTaskRequiredMetadata >> >INFO: Task: [Crawler Task] has no required metadata fields Oct 06, 2014 >> >5:48:42 PM >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread >> >executeTaskLocally >> >INFO: Executing task: [Crawler Task] locally >> >java.lang.NullPointerException >> > at >> >org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance.run(Exte >> >rnS >> >criptTaskInstance.java:72) >> > at >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.ex >> >ecu >> >teTaskLocally(IterativeWorkflowProcessorThread.java:574) >> > at >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.ru >> >n(I >> >terativeWorkflowProcessorThread.java:321) >> > at >> >EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown >>Source) >> > at java.lang.Thread.run(Thread.java:745) >> >Oct 06, 2014 5:48:42 PM >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread >> >executeTaskLocally >> >WARNING: Exception executing task: [Crawler Task] locally: Message: >> >null >> > >> > >> > >> > >> ><cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> >> ><!-- >> > TODO: Add some examples >> >--> >> > >> > <task id="urn:oodt:crawlerTask" name="Crawler Task" >> >>>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"/> >> > <conditions/> <!-- There are no pre execution conditions right >> >now >> >--> >> > <configuration> >> > <property name="ShellType" value="/bin/sh" /> >> > <property name="PathToScript" >> >value="[OODT_HOME]/crawler/bin/crawler_launcher"/> >> > </configuration> >> > <metadata> >> > <args> >> > <arg>--operation</arg> >> > <arg>--launchAutoCrawler</arg> >> > <arg>--productPath</arg> >> > <arg>[OODT_HOME]/data/staging</arg> >> > <arg>--filemgrUrl</arg> >> > <arg>http://localhost:9000</arg> >> > <arg>--clientTransferer</arg> >> > >> ><arg>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory< >> >/ar >> >g> >> > <arg>--mimeExtractorRepo</arg> >> > >> ><arg>[$OODT_HOME]/extensions/policy/mime-extractor-map.xml</arg> >> > <arg>--actionIds</arg> >> > <arg>MoveFileToLevel0Dir</arg> >> > </args> >> > </metadata> >> ></cas:tasks> >> > >> > >> >Valerie A. Mallder >> > >> >New Horizons Deputy Mission System Engineer The Johns Hopkins >> >University/Applied Physics Laboratory >> >11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 >> >240-228-7846 (Office) 410-504-2233 (Blackberry) >> > >> >
