Yes, I saw the learn by example documentation. And maybe if I hadn't already built a configuration that runs the crawler successfully the example might make more sense to me. It doesn't look like anything I have already done can be used in that setup. It is not clear whether I have to move, duplicate or rewrite my extractor, move or duplicate or rewrite my metadata definitions, etc. Is the metadata extracted from the crawler's extractor shared with other PGE tasks? The example uses a different directory structure than my crawler understands and it's not clear how to map the example directories to crawler directories. In the example's tasks.xml file, it's not clear whether the configuration that is shown is one that is required to be defined for each task that you define in the file. There are just a lot of questions that come up when I try to adapt the example to my system specifically. But, I will look at DRAT and try to work through it.
Thanks, Val Valerie A. Mallder New Horizons Deputy Mission System Engineer Johns Hopkins University/Applied Physics Laboratory > -----Original Message----- > From: Chris Mattmann [mailto:[email protected]] > Sent: Tuesday, October 07, 2014 11:02 AM > To: [email protected] > Subject: Re: how to pass arguments to workflow task that is external script > > Thanks Val, I agree, yes, CAS-PGE is complex. > > Did you see the learn by example wiki page: > > https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example > > > I think it?s pretty basic and illustrates what CAS-PGE does. > > Basically the jist of it is: > > 1. you only need to create a PGEConfig.xml file that specifies: > - how to generate input for your integrated algorithm > - how to execute your algorithm (e.g., how to generate a script that > executes it) > - how to generate metadata from the output, and then to crawl the files > + met and get the outputs into the file manager > > 2. you go into workflow tasks.xml, define a new CAS-PGE type task, point at > this > config file, and provide CAS-PGE task properties (an example is > here: > http://svn.apache.org/repos/asf/oodt/trunk/pge/src/main/resources/examples/ > WorkflowTask/ > > > If you want to see a basic example of CAS-PGE in action, check out DRAT: > > https://github.com/chrismattmann/drat/ > > It?s a RADIX-based deployment with 2 CAS-PGEs (one for the MIME partition; and > another for RAT). > > Check that out, see how DRAT works (and integrates CAS-PGE) and then let me > know if you are still confused and I will be glad to help more. > > Cheers, > Chris > > ------------------------ > Chris Mattmann > [email protected] > > > > > -----Original Message----- > From: "Mallder, Valerie" <[email protected]> > Reply-To: <[email protected]> > Date: Tuesday, October 7, 2014 at 4:56 PM > To: "[email protected]" <[email protected]> > Subject: RE: how to pass arguments to workflow task that is external script > > >Thanks Chris, > > > >The CAS-PGE is pretty complex, I've read the documentation and it is > >still way over my head. Is there any documentation or examples for how > >to integrate the crawler into it? For instance, can I still use the > >crawler_launcher script? Will the ExternMetExtractor and a > >postIngestSuccess ExternAction script work that I created to work with > >the crawler still work "as is" in the CAS-PGE ? Or, should I invoke > >them differently? What about the Metadata that I extracted with the crawler? > >Do I have to redefine the metadata elements in another configuration > >file or policy file? If there is any documentation on doing this > >please point me to the right place because I didn't see anything that > >addressed these kinds of questions. > > > >Thanks, > >Val > > > >Do I have to define these any differently in the PGE configuration > > > > > >Valerie A. Mallder > >New Horizons Deputy Mission System Engineer Johns Hopkins > >University/Applied Physics Laboratory > > > >> -----Original Message----- > >> From: Chris Mattmann [mailto:[email protected]] > >> Sent: Tuesday, October 07, 2014 8:16 AM > >> To: [email protected] > >> Subject: Re: how to pass arguments to workflow task that is external > >>script > >> > >> Hi Val, > >> > >> Thanks for the detailed report. My suggestion would be to use CAS-PGE > >>directly instead of ExternScriptTaskInstance. That application is not > >>well maintained, doesn?t produce a log, etc, etc, all of the things > >>you?ve noted. > >> > >> CAS-PGE on the other hand, will (a) prepare input for your task; (b) > >>describe how to run your task (even as a script and will generate a > >>script); and (c) will run met extractors and fork a crawler in your > >>job directory in the end. > >> > >> I think it?s what you?re looking for and it?s way more well > >>documented on the wiki. > >> > >> Please check it out and let me know what you think. > >> > >> Cheers, > >> Chris > >> > >> ------------------------ > >> Chris Mattmann > >> [email protected] > >> > >> > >> > >> > >> -----Original Message----- > >> From: "Mallder, Valerie" <[email protected]> > >> Reply-To: <[email protected]> > >> Date: Monday, October 6, 2014 at 11:53 PM > >> To: "[email protected]" <[email protected]> > >> Subject: how to pass arguments to workflow task that is external > >> script > >> > >> >Hello, > >> > > >> >I'm stuck again L This time I'm stuck trying to start my crawler as > >> >a task using the workflow manager. I am not using a PGE task right now. > >> >I'm just trying to do something simple with the workflow manager, > >> >filemgr, and crawler. I have read all of the documentation that is > >> >available on the workflow manager and have tried to piece together a > >> >setup based on the examples, but, things seem to be working > >> >differently now and the documentation hasn't caught up, which is > >> >totally understandable and not a criticism. Just want you to know > >> >that I try to do my due diligence before bothering anyone for help. > >> > > >> >I am not running the resource manager, and I have commented out > >> >setting the resource manager url in the workflow.properties file so > >> >that workflow manager will execute the job locally. > >> > > >> >I am sending workflow manager an event (via the command line using > >> >wmgr-client) called "startJediPipeline". Workflow manager receives > >> >the event, and retrieves my workflow from the repository and tries > >> >to execute the first (and only) task, and then it crashes. My task > >> >is an external script (the crawler_launcher script) and I need to > >> >pass several arguments to it. I've spent all day trying to figure > >> >out how to pass arguments to the and ExternScriptTaskInstance, but > >> >there are no examples of doing this, so I had to wing it. I tried > >> >putting the arguments in the task configuration properties. That > >> >didn't work. So I tried putting the arguments in the metadata > >> >properties, and that hasn't worked. So, your suggestions are > >> >welcome! Thanks so much. Here's the error log, And contents of my > tasks.xml file follow it at the end. > >> > > >> >Workflow Manager started PID file > >> >(/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workfl > >> >ow/ > >> >run > >> >/cas.workflow.pid). > >> >Starting OODT File Manager [ Successful ] Starting OODT Resource > >> >Manager [ Failed ] Starting OODT Workflow Manager [ Successful ] > >> >slothrop:{~/project/jedi/users/jedi-pipeline/oodt-deploy/bin} Oct > >> >06, > >> >2014 5:48:30 PM > >> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager > >> >loadProperties > >> >INFO: Loading Workflow Manager Configuration Properties from: > >> >[/homes/malldva1/project/jedi/users/jedi-pipeline/oodt-deploy/workfl > >> >ow/ > >> >etc > >> >/workflow.properties] > >> >Oct 06, 2014 5:48:30 PM > >> >org.apache.oodt.cas.workflow.engine.ThreadPoolWorkflowEngineFactory > >> >getResmgrUrl > >> >INFO: No Resource Manager URL provided or malformed URL: executing > >> >jobs locally. URL: [null] Oct 06, 2014 5:48:30 PM > >> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager <init> > >> >INFO: Workflow Manager started by malldva1 Oct 06, 2014 5:48:41 PM > >> >org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager > >> >handleEvent > >> >INFO: WorkflowManager: Received event: startJediPipeline Oct 06, > >> >2014 > >> >5:48:41 PM org.apache.oodt.cas.workflow.system.XmlRpcWorkflowManager > >> >handleEvent > >> >INFO: WorkflowManager: Workflow Jedi Pipeline Workflow retrieved for > >> >event startJediPipeline Oct 06, 2014 5:48:41 PM > >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >> >checkTaskRequiredMetadata > >> >INFO: Task: [Crawler Task] has no required metadata fields Oct 06, > >> >2014 > >> >5:48:42 PM > >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >> >executeTaskLocally > >> >INFO: Executing task: [Crawler Task] locally > >> >java.lang.NullPointerException > >> > at > >> >org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance.run(E > >> >xte > >> >rnS > >> >criptTaskInstance.java:72) > >> > at > >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >> >.ex > >> >ecu > >> >teTaskLocally(IterativeWorkflowProcessorThread.java:574) > >> > at > >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >> >.ru > >> >n(I > >> >terativeWorkflowProcessorThread.java:321) > >> > at > >> >EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown > >>Source) > >> > at java.lang.Thread.run(Thread.java:745) > >> >Oct 06, 2014 5:48:42 PM > >> >org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread > >> >executeTaskLocally > >> >WARNING: Exception executing task: [Crawler Task] locally: Message: > >> >null > >> > > >> > > >> > > >> > > >> ><cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > >> ><!-- > >> > TODO: Add some examples > >> >--> > >> > > >> > <task id="urn:oodt:crawlerTask" name="Crawler Task" > >> > >>>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance > >>>"/> > >> > <conditions/> <!-- There are no pre execution conditions > >> >right now > >> >--> > >> > <configuration> > >> > <property name="ShellType" value="/bin/sh" /> > >> > <property name="PathToScript" > >> >value="[OODT_HOME]/crawler/bin/crawler_launcher"/> > >> > </configuration> > >> > <metadata> > >> > <args> > >> > <arg>--operation</arg> > >> > <arg>--launchAutoCrawler</arg> > >> > <arg>--productPath</arg> > >> > <arg>[OODT_HOME]/data/staging</arg> > >> > <arg>--filemgrUrl</arg> > >> > <arg>http://localhost:9000</arg> > >> > <arg>--clientTransferer</arg> > >> > > >> ><arg>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFacto > >> >ry< > >> >/ar > >> >g> > >> > <arg>--mimeExtractorRepo</arg> > >> > > >> ><arg>[$OODT_HOME]/extensions/policy/mime-extractor-map.xml</arg> > >> > <arg>--actionIds</arg> > >> > <arg>MoveFileToLevel0Dir</arg> > >> > </args> > >> > </metadata> > >> ></cas:tasks> > >> > > >> > > >> >Valerie A. Mallder > >> > > >> >New Horizons Deputy Mission System Engineer The Johns Hopkins > >> >University/Applied Physics Laboratory > >> >11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > >> >240-228-7846 (Office) 410-504-2233 (Blackberry) > >> > > >> > > >
