Can you do a ls -al of your /lib directory please? Also can you please provide any relevant snippet of your pom.xml which contains filemgr pom.xml dependency Thank you Lewis
On Thu, Oct 9, 2014 at 2:20 PM, Mallder, Valerie <[email protected] > wrote: > Thanks Chris, (Thanks everyone for all of the help, it was helpful, > really it was :) ) > > My brain is exhausted ..... (heavy sigh) and I feel like I have to start > all over again. > > My intention (after I got the crawler and filemanager working together > last week), was to integrate it with the workflow manager to demonstrate > launching a workflow that consisted of a simple script that runs before the > crawler, and then run the crawler. After that, I was going to try to > integrate a java application into the workflow, and try to continue > integrating new things step by step. I think everything would have been > fine in this simple setup if I could have just gotten the > ExternScriptTaskInstance to run. But that was a huge fail. It doesn't look > like the test program for that class tests it the way I want to use it, so > I have no idea if it actually works or not. The code implies that you can > specify arguments to your external script, but I could not find a way to > get them read in. The the getAllMetada method always returned a null list > of arguments that causes an exception on line 72. > > So right now, I've basically gone back to the beginning of using CAS-PGE, > and I'm trying to get the crawler to run as the very first step in my > pipeline the raw telemetry files that are dropped off by FEI. After the > ingestion and archival, one of the postIngestSuccess actions of the crawler > copies all of the new raw telemetry files to a directory where we store all > of the level 0 files. The level 0 directory (and all of it's > subdirectories and files) is what I consider to be the "output" of this > simple first step of the pipeline. I realize that I may need to start a > crawler again at a later point in the pipeline. But I want to focus on one > step at a time. > > Chris, In regards to your comments below, here are two questions followed > by the contents of my .xml files for review. > > [1]- When you say " define blocks in the <output>..</output> section of > the XML file", what xml file are you referring to? I think the > <output>..</output> tags can only go in the PGE config file, is that > correct? > > Here is what I have in my fei-crawler-pge-config.xml file. Is this OK? > <!-- Files to ingest --> > <output> > <dir="[FEI_DROP_DIR]" envReplace="true" /> > </output> > > [2] If I don't need to define a CAS-PGE Task, how do I tell the workflow > to start the crawler? Right now, I am trying to do it with a task, but if > you can tell me how to do it without a task, I will be happy to try it. > > > > So, here is my current workflow: > > <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas" > id="urn:oodt:jediWorkflowId" name="jediWorkflowName"> > <tasks> > <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName" /> > </tasks> > </cas:workflow> > > Here is my current task: > > <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName" > class="org.apache.oodt.cas.pge.StdPGETaskInstance"> > <configuration> > <property name="PGETask/Name" value="feiCrawlerTaskname"/> > <property name="PGETask/ConfigFilePath" > value="[OODT_HOME]/extensions/config/fei-crawler-pge-config.xml" > envReplace="true"/> > <property name="PGETask/DumpMetadata" value="true"/> > <property name="PGETask/WorkflowManagerUrl" > value="[WORKFLOW_URL]" envReplace="true" /> > <property name="PGETask/Query/FileManagerUrl" > value="[FILEMGR_URL]" envReplace="true"/> > <property name="PGETask/Ingest/FileManagerUrl" > value="[FILEMGR_URL]" envReplace="true"/> > > <property name="PGETask/Query/ClientTransferServiceFactory" > value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/> > <property name="PGETask/Ingest/CrawlerConfigFile" > value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/> > <property name="PGETask/Ingest/MimeExtractorRepo" > value="file:[OODT_HOME]/extensions/policy/mime-extractor-map.xml" > envReplace="true"/> > <property name="PGETask/Ingest/ActionIds" > value="MoveFileToLevel0Dir" envReplace="true"/> > <property name="PGE_HOME" value="[PGE_HOME]" envReplace="true"/> > </configuration> > </task> > </cas:tasks> > > And, here is my current PGE config - fei-crawler-pge-config.xml > > <pgeConfig> > <!-- How to run the PGE --> > <exe dir="[OODT_HOME]"> > <cmd>mkdir [JobDir]</cmd> > </exe> > > <!-- Files to ingest --> > <output> > <dir="[FEI_DROP_DIR]" envReplace="true" /> > </output> > > <!-- Custom metadata to add to output files --> > <customMetadata> > <metadata key="JobDir" value="[OODT_HOME]/data/pge/jobs" /> > </customMetadata> > </pgeConfig> > > > > With the settings in these settings I do not get to the point where the > first command in the PGE config gets executed. The data/pge/jobs directory > does not get created. However, the workflow starts and the task gets > submitted to the resource manager, and a new thread called "Thread-2" gets > spawned. But, "Thread-2" gets an exception and that's it. I thought maybe > it was due to the fact that the filemgr jar is not in the resmgr/lib > direcory when you do the radix install. So, I copied the filemgr jar file > to resmgr/lib and ran again, but still get the same the exception. And, > the filemgr IS running, and I shut down all of the filemgr, workfow mgr, > resmgr and batch_stub each time I run so that every run starts with new > processes every time. > > If anyone has any recommendations on a better way to do this please let me > know. > > Thanks, > Val > > > > > > > INFO: Task: [feiCrawlerTaskName] has no required metadata fields > Exception in thread "Thread-2" java.lang.NoClassDefFoundError: > org/apache/oodt/cas/filemgr/metadata/CoreMetKeys > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:190) > at > org.apache.oodt.cas.workflow.util.GenericWorkflowObjectFactory.getTaskObjectFromClassName(GenericWorkflowObjectFactory.java:169) > at > org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.run(IterativeWorkflowProcessorThread.java:222) > at > EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > org.apache.oodt.cas.filemgr.metadata.CoreMetKeys > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 28 more > > > > Valerie A. Mallder > New Horizons Deputy Mission System Engineer > Johns Hopkins University/Applied Physics Laboratory > > > > -----Original Message----- > > From: Chris Mattmann [mailto:[email protected]] > > Sent: Wednesday, October 08, 2014 2:52 PM > > To: [email protected] > > Subject: Re: what is batch stub? Is it necessary? > > > > Hi Val, > > > > I don?t think you need to run a CAS-PGE task to call crawler_launcher. > If you > > define blocks in the <output>..</output> section of the XML file, a > crawler will be > > forked in the job working directory of CAS-PGE and crawl your specified > output. > > > > I believe that will accomplish the same goal of what you are looking for. > > > > No need to have crawling be a separate task from CAS-PGE - CAS-PGE will > do > > the crawling for you! :) > > > > Cheers, > > Chris > > > > ------------------------ > > Chris Mattmann > > [email protected] > > > > > > > > > > -----Original Message----- > > From: "Verma, Rishi (398J)" <[email protected]> > > Reply-To: <[email protected]> > > Date: Thursday, October 9, 2014 at 2:44 AM > > To: "[email protected]" <[email protected]> > > Subject: Re: what is batch stub? Is it necessary? > > > > >Hi Val, > > > > > >Yep - here?s a link to the tasks.xml file: > > >https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/w > > >ork flow/src/main/resources/policy/tasks.xml > > > > > >> The problem is that the ExternScriptTaskInstance is unable to > > >>recognize the command line arguments that I want to pass to the > > >>crawler_launcher script. > > > > > > > > >Hmm.. could you share your workflow manager log, or better yet, the > > >batch_stub output? Curious to see what error is thrown. > > > > > >Is a script file being generated for your PGE? For example, inside your > > >[PGE_HOME] directory, and within the particular job directory created > > >for your execution of a workflow, you will see some files starting with > > >?sciPgeExeScript_??. You?ll find one for your pgeConfig, and you can > > >check to see what the PGE commands actually translate into, with > > >respect to a shell script format. If that file is there, take a look at > > >it, and validate whether the command works within the script (i.e. > > >copy/paste and run the crawler command manually). > > > > > >Another suggestion is to take a step back, and build up slowly, i.e.: > > >1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo ?Hello > > >APL.? > /tmp/test.txt</cmd>) 2. If above works, do a crawler_launcher > > >empty command(e.g. > > ><cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the > > >batch_stub or Workflow Manager prints some kind of output when you run > > >the workflow. > > >3. Build up your crawler_launcher command piece by piece to see where > > >it is failing > > > > > >Thanks, > > >Rishi > > > > > >On Oct 8, 2014, at 4:24 PM, Mallder, Valerie > > ><[email protected]> > > >wrote: > > > > > >> Hi Rishi, > > >> > > >> Thank you very much for pointing me to your working example. This is > > >>very helpful. My pgeConfig looks very similar to yours. So, I > > >>commented out the resource manager like you suggested and tried > > >>running again without the resource manager. And my problem still > > >>exists. The problem is that the ExternScriptTaskInstance is unable to > > >>recognize the command line arguments that I want to pass to the > > >>crawler_launcher script. Could you send me a link to your tasks.xml > > >>file? I'm curious as to how you defined your task. My pgeConfig and > tasks.xml > > are below. > > >> > > >> Thanks! > > >> Val > > >> > > >> > > >> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig> > > >> > > >> <!-- How to run the PGE --> > > >> <exe dir="[JobDir]" shell="/bin/sh" envReplace="true"> > > >> <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation > > >>--launchAutoCrawler \ > > >> --filemgrUrl [FILEMGR_URL] \ > > >> --clientTransferer > > >>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \ > > >> --productPath [JobInputDir] \ > > >> --mimeExtractorRepo > > >>[OODT_HOME]/extensions/policy/mime-extractor-map.xml \ > > >> --actionIds MoveFileToLevel0Dir</cmd> > > >> </exe> > > >> > > >> <!-- Files to ingest --> > > >> <output/> > > >> </output> > > >> > > >> <!-- Custom metadata to add to output files --> > > >> <customMetadata> > > >> <metadata key="JobDir" val="[OODT_HOME]"/> > > >> <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/> > > >> <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/> > > >> <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/> > > >> </customMetadata> > > >> > > >> </pgeConfig> > > >> > > >> > > >> > > >> <!-- tasks.xml **************************************************--> > > >> > > >> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"> > > >> > > >> <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName" > > >>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance"> > > >> <conditions/> <!-- There are no pre execution conditions right > > >>now --> > > >> <configuration> > > >> > > >> <property name="ShellType" value="/bin/sh" /> > > >> <property name="PathToScript" > > >>value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" /> > > >> > > >> <property name="PGETask_Name" value="crawler_launcher PGE > > >>Task"/> > > >> <property name="PGETask_ConfigFilePath" > > >>value="[OODT_HOME]/extensions/config/crawler-pge-config.xml" > > >>envReplace="true" /> > > >> </configuration> > > >> </task> > > >> > > >> </cas:tasks> > > >> > > >> Valerie A. Mallder > > >> New Horizons Deputy Mission System Engineer Johns Hopkins > > >> University/Applied Physics Laboratory > > >> > > >> > > >>> -----Original Message----- > > >>> From: Verma, Rishi (398J) [mailto:[email protected]] > > >>> Sent: Wednesday, October 08, 2014 6:01 PM > > >>> To: [email protected] > > >>> Subject: Re: what is batch stub? Is it necessary? > > >>> > > >>> Hi Valerie, > > >>> > > >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow > > >>>>>>> task in the CAS PGE environment. > > >>> > > >>> Interesting. I have a working example here [1] you can look at that > > >>>does this exact thing. > > >>> > > >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me > > >>>>>>> what it is, why it is necessary, and how to run it (please > > >>>>>>> provide exact syntax to put in my startup shell script, because > > >>>>>>> I would never be able to figure it out for myself and I don't > > >>>>>>> want to have to bother everyone again.) > > >>> > > >>> Batchstub is only necessary if your Workflow Manger is sending jobs > > >>>to Resource Manager for execution (where the default execution is to > > >>>run the job in something called a ?batch stub? executable). Think of > > >>>batch stubs as a small wrapper program that takes a bundle of > > >>>executable instructions from Resource Manager, and executes them in > > >>>a shell environment within a given remote (or > > >>>local) machine. > > >>> > > >>> Here?s my suggestion: > > >>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the > > >>>following command (it?ll start a batch stub in a terminal on port > > >>>2001): > > >>>> ./batch_stub 2001 > > >>> > > >>> If the above step doesn?t fix your problem, you can also try having > > >>>Workflow Manager NOT send jobs to Resource Manager for execution, > > >>>and instead execute jobs locally through Workflow Manager itself (on > > >>>localhost only!). To disable job transfer to Resource Manger, you?ll > > >>>need to modify the Workflow Manager properties file > > >>>($OODT_HOME/wmgr/etc/workflow.properties), and specifically comment > > >>>out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url? > > >>>line. > > >>> I?ve done this in my example code below, see [2] for an exact > > >>>example of this. > > >>> After modifying workflow.properties, make sure to restart workflow > > >>>manager > > >>> ($OODT_HOME/wmgr/bin/wmgr stop followed by > > $OODT_HOME/wmgr/bin/wmgr > > >>> start). > > >>> > > >>> Thanks, > > >>> Rishi > > >>> > > >>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- > > >>> > > >>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsam > > >>>ple > > >>>.xml > > >>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt- > > >>> netscan/workflow/src/main/resources/etc/workflow.properties > > >>> > > >>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J) > > >>> <[email protected]> wrote: > > >>> > > >>>> Valerie, > > >>>> > > >>>> I would have thought it would have just not used a batch stub by > > >>>>default. That > > >>> said if you go into the $OODT_HOME/resmgr/bin there should be a > > >>>script to start a batch stub. Right now on my phone I forget the > > >>>name of the script but if you more the file you will see the Java > > >>>class name that corresponds to below. > > >>>You should > > >>> specify a port when you run the script which from the looks of the > > >>>output below should be 2001. > > >>>> > > >>>> HTH, > > >>>> Paul R > > >>>> > > >>>> Sent from my iPhone > > >>>> > > >>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie > > >>>>><[email protected]> > > >>> wrote: > > >>>>> > > >>>>> Well then, I'm proud to be a member :) (I think .... ) > > >>>>> > > >>>>> > > >>>>> Valerie A. Mallder > > >>>>> New Horizons Deputy Mission System Engineer Johns Hopkins > > >>>>> University/Applied Physics Laboratory > > >>>>> > > >>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: Bruce Barkstrom [mailto:[email protected]] > > >>>>>> Sent: Wednesday, October 08, 2014 4:54 PM > > >>>>>> To: [email protected] > > >>>>>> Subject: Re: what is batch stub? Is it necessary? > > >>>>>> > > >>>>>> You have every right to bother everyone. > > >>>>>> You won't get what you need unless you do. > > >>>>>> > > >>>>>> You get one honorary membership in the Society of General > > >>>>>> Agitators > > >>>>>> - at the rank of Major Agitator. > > >>>>>> > > >>>>>> Bruce B. > > >>>>>> > > >>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie > > >>>>>> <[email protected] > > >>>>>>> wrote: > > >>>>>> > > >>>>>>> Hello, > > >>>>>>> > > >>>>>>> I am still having trouble getting my CAS PGE crawler task to run > > >>>>>>>due to > > >>>>>>> http://localhost:2001 being "down". I have spent the last 2 days > > >>>>>>>tracing through the resource manager code and tracked this down > > >>>>>>>to line 146 of LRUScheduler where the XmlRpcBatchMgr is failing > > >>>>>>>to execute the task remotely, because on line 75 of > > >>>>>>>XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on > > >>>>>>>its line 74) is trying to call "isAlive" on the webservice named > > >>>>>>>"batchstub" which, to my knowledge, is not running because I have > > >>>>>>>not done > > >>> anything explicitly to run it. > > >>>>>>> > > >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow > > >>>>>>> task in the CAS PGE environment. I had it running perfectly > > >>>>>>> before I started trying to make it run as part of a workflow. I > > >>>>>>> really miss my crawler and really want it to run again L > > >>>>>>> > > >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me > > >>>>>>> what it is, why it is necessary, and how to run it (please > > >>>>>>> provide exact syntax to put in my startup shell script, because > > >>>>>>> I would never be able to figure it out for myself and I don't > > >>>>>>> want to have to bother everyone again.) > > >>>>>>> > > >>>>>>> Thanks so much! > > >>>>>>> > > >>>>>>> Val > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> Valerie A. Mallder > > >>>>>>> > > >>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins > > >>>>>>> University/Applied Physics Laboratory > > >>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723 > > >>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry) > > >>>>>>> > > >>>>>>> > > >>> > > >>> --- > > >>> Rishi Verma > > >>> NASA Jet Propulsion Laboratory > > >>> California Institute of Technology > > >> > > > > > >--- > > >Rishi Verma > > >NASA Jet Propulsion Laboratory > > >California Institute of Technology > > > > > > > -- *Lewis*
