---------- Forwarded message ---------- From: MengYing Wang <mengyingwa...@gmail.com> Date: Thu, Oct 23, 2014 at 10:00 PM Subject: Re: Directed Research Weekly Report from 2014/09/29 - 2014/10/05 To: "Verma, Rishi (398M)" <rishi.ve...@jpl.nasa.gov> Cc: Christian Alan Mattmann <mattm...@usc.edu>, "Mcgibbney, Lewis J (398M)" <lewis.j.mcgibb...@jpl.nasa.gov>, "Bryant, Ann C (398G-Affiliate)" < anniebry...@gmail.com>, "Ramirez, Paul M (398M)" < paul.m.rami...@jpl.nasa.gov>, "Mattmann, Chris A (3980)" < chris.a.mattm...@jpl.nasa.gov>, Tyler Palsulich <tpalsul...@gmail.com>, " u...@oodt.apache.org" <u...@oodt.apache.org>
Dear Rishi, I followed the new steps to use the OODT RADiX. Unfortunately, I got the same "Profile with id: 'fm-solr-catalog' has not been activated" error. Below are my commands, and some terminal output. Please check it to see if I have made some mistakes, or is it possible that something wrong with the source code? Really appreciate for your help! Step 1: $svn co http://svn.apache.org/repos/asf/oodt/trunk/ oodt_radix A oodt_radix/curator A oodt_radix/curator/pom.xml A oodt_radix/curator/src A oodt_radix/curator/src/test ...... Checked out revision 1633738. Step 2: $cd oodt_radix/ Step 3: $mvn clean install [INFO] Scanning for projects... [INFO] Reactor build order: [INFO] OODT Core [INFO] Common Utilities [INFO] CAS Command Line Interface ...... [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 5 minutes 29 seconds [INFO] Finished at: Wed Oct 22 20:40:38 PDT 2014 [INFO] Final Memory: 133M/254M Step 4: $mvn archetype:generate [INFO] Scanning for projects... [INFO] Reactor build order: [INFO] OODT Core [INFO] Common Utilities ...... [INFO] project created from Old (1.x) Archetype in dir: /Users/AngelaWang/Downloads/oodt_radix/radix-archetype [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESSFUL [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1 minute 40 seconds [INFO] Finished at: Wed Oct 22 20:52:46 PDT 2014 [INFO] Final Memory: 36M/84M [INFO] ------------------------------------------------------------------------ Step 5: $cd radix-archetype/ Step 6: $mvn clean package -Pfm-solr-catalog [INFO] Scanning for projects... [WARNING] Profile with id: 'fm-solr-catalog' has not been activated. [INFO] ------------------------------------------------------------------------ [INFO] Building radix-archetype [INFO] task-segment: [clean, package] [INFO] ------------------------------------------------------------------------ ..... Best, Mengying (Angela) Wang On Sat, Oct 18, 2014 at 3:49 PM, Verma, Rishi (398M) < rishi.ve...@jpl.nasa.gov> wrote: > Hi MengYing, > > Your CMD1 should not have the ‘-Pfm-solr-catalog’ argument. The reason > is because that command *generates* a new project for you, whereas, the > ‘-Pfm-solr-catalog’ should only be used to *build* the project once it > has already been generated. You might want to read up a bit on Maven > archetypes, which is what OODT RADiX is. > http://maven.apache.org/guides/introduction/introduction-to-archetypes.html > > Let me explain in this way, here’s the steps to using OODT RADiX: > 1. Get a hold of the latest OODT RADiX Maven Archetype (you might have > already done this if you have the full OODT source) > i.e. download the full OODT source and invoke ‘mvn install’ so that > you can use the latest RADiX archetype > http://svn.apache.org/repos/asf/oodt/trunk/ > 2. Use the OODT RADiX Maven Archetype to *generate* a new OODT project > source folder structure for you (this is the source for your new project!) > i.e. invoke the command: > > mvn archetype:generate > (select RADiX from the list of archetypes you see, and follow the prompts) > 3. Change into the newly generated directory from above, and *build* a > tar-ball distribution of OODT that you can run from the source folder > structure you generated earlier > > mvn clean package -Pfm-solr-catalog > 4. Take the build tar-ball distribution, and extract it somewhere else for > launching OODT > > tar zxf distribution/target/oodt-*.jar -C /usr/local/my-oodt-project > 5. Run OODT > > cd /usr/local/my-oodt-project/bin > > ./oodt start > > That’s the typical workflow for using RADiX. So the key here is, only > use the ‘-Pfm-solr-catalog’ argument when *building* OODT, not when > *generating* the folder structure. > > *If you’re starting from scratch:* > 1. Use Vagrant Virtual Machine technology to get a pre-built OODT > deployment connected to Solr in one command: > https://cwiki.apache.org/confluence/display/OODT/Vagrant+Powered+OODT > > [ I didn't try this approach ] > > > You should try this! Because all five steps above are automated for > you via the Vagrant machine. > > Thanks, > rishi > > On Oct 17, 2014, at 11:29 AM, MengYing Wang <mengyingwa...@gmail.com> > wrote: > > Dear Rishi, > > Actually, in the first command of the tutorial > <https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT#RADiXPoweredByOODT-TheCommands>: > curl > -s > http://svn.apache.org/repos/asf/oodt/trunk/mvn/archetypes/radix/src/main/resources/bin/radix > | bash, the "default" instead of the "fm-solr-catalog" profile is already > activated. So the Solr component is not built. > > guest-wireless-207-151-035-005:Downloads AngelaWang$ curl -s > http://svn.apache.org/repos/asf/oodt/trunk/mvn/archetypes/radix/src/main/resources/bin/radix > | bash > [INFO] Scanning for projects... > [INFO] Searching repository for plugin with prefix: 'archetype'. > [INFO] > ------------------------------------------------------------------------ > [INFO] Building Maven Default Project > [INFO] task-segment: [archetype:generate] (aggregator-style) > [INFO] > ------------------------------------------------------------------------ > [INFO] Preparing archetype:generate > [INFO] No goals needed for project - skipping > ...... > > > BTW, the "fm-solr-catalog" profile is defined in the filemgr pom.xml > <http://svn.apache.org/repos/asf/oodt/trunk/mvn/archetypes/radix/src/main/resources/archetype-resources/filemgr/pom.xml> > and distribution pom.xml > <http://svn.apache.org/repos/asf/oodt/trunk/mvn/archetypes/radix/src/main/resources/archetype-resources/distribution/pom.xml> > . > > Best, > Mengying (Angela) Wang > > On Thu, Oct 16, 2014 at 7:42 PM, MengYing Wang <mengyingwa...@gmail.com> > wrote: > >> Dear Rishi, >> >> Thank you for your help. >> >> Yes, I am using the OODT 0.7 and also running the 'mvn package >> -Pfm-solr-catalog' command from the top-level directory. >> >> Following are the commands and logs: >> >> Cmd 1: >> >> guest-wireless-207-151-035-005:Downloads AngelaWang$ mvn >> archetype:generate -Pfm-solr-catalog -DarchetypeGroupId=org.apache.oodt >> -DarchetypeArtifactId=radix-archetype -DarchetypeVersion=0.6 -Doodt=0.7 >> -DgroupId=com.mycompany -DartifactId=oodt -Dversion=0.1 >> >> [INFO] Scanning for projects... >> >> [WARNING] >> >> Profile with id: 'fm-solr-catalog' has not been activated. >> >> ...... >> >> [INFO] BUILD SUCCESSFUL >> >> ...... >> >> Cmd 2: >> >> >> guest-wireless-207-151-035-005:Downloads AngelaWang$ cd oodt >> >> Cmd 3: >> >> guest-wireless-207-151-035-005:oodt AngelaWang$ mvn clean package >> -Pfm-solr-catalog >> >> [INFO] Scanning for projects... >> >> [WARNING] >> >> Profile with id: 'fm-solr-catalog' has not been activated. >> >> [INFO] Reactor build order: >> >> [INFO] Data Management System >> >> [INFO] Extensions >> >> ...... >> >> >> Thank you. >> >> Best, >> >> Mengying Wang >> >> >> >> >> >> >> On Thu, Oct 16, 2014 at 6:09 PM, Verma, Rishi (398M) < >> rishi.ve...@jpl.nasa.gov> wrote: >> >>> Hey Mengying, >>> >>> That error usually gets thrown if you invoke a Maven build from a >>> subdirectory not containing the profile definition. >>> >>> Two things to check for: >>> * Are you calling ‘mvn clean package -Pfm-solr-catalog from the >>> top-level directory of your RADiX installation? i.e. the directory >>> containing a pom.xml file and folders like ‘crawler’, ‘distribution’, >>> ‘extensions’, etc ... >>> * Are you running an OODT version 0.7+? >>> >>> Thanks, >>> rishi >>> >>> On Oct 16, 2014, at 4:45 PM, MengYing Wang <mengyingwa...@gmail.com> >>> wrote: >>> >>> Dear Rishi, >>> >>> When I try to use the OODT RADiX using the command "mvn clean package >>> -Pfm-solr-catalog", I get the "profile with id: 'fm-solr-catalog' has >>> not been activated" error. Have you by any chance seen this error before? >>> Thank you! Also after the installation, no solr directory is found in my >>> machine too. >>> >>> $ mvn clean package -Pfm-solr-catalog >>> >>> [INFO] Scanning for projects... >>> >>> [WARNING] >>> >>> Profile with id: 'fm-solr-catalog' has not been activated. >>> >>> [INFO] Reactor build order: >>> >>> [INFO] Data Management System >>> >>> ...... >>> Best, >>> Mengying Wang >>> >>> On Sun, Oct 5, 2014 at 3:05 PM, Verma, Rishi (398J) < >>> rishi.ve...@jpl.nasa.gov> wrote: >>> >>>> Hi Mengying, >>>> >>>> For integrating OODT File Manager with Solr, you have a couple >>>> options depending on the type of deployment you are doing and what stage >>>> your software is at: >>>> >>>> *If you’re starting from scratch:* >>>> 1. Use Vagrant Virtual Machine technology to get a pre-built OODT >>>> deployment connected to Solr in one command: >>>> https://cwiki.apache.org/confluence/display/OODT/Vagrant+Powered+OODT >>>> 2. Use OODT RADiX for a pre-built deployment directory containing OODT >>>> File Manager, Workflow, Resource etc and Solr pre-integrated. RADiX allows >>>> for pre-configured OODT deployments, abstracting you from checking out >>>> individual OODT modules via source and building them. >>>> See: >>>> https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT#RADiXPoweredByOODT-TheCommands >>>> Make sure to build with the command: mvn -Pfm-solr-catalog package >>>> (see read me: >>>> http://svn.apache.org/repos/asf/oodt/trunk/mvn/archetypes/radix/src/main/resources/archetype-resources/README.txt >>>> ) >>>> 3. Connect OODT FM with Solr manually, see: >>>> https://cwiki.apache.org/confluence/display/OODT/Integrating+Solr+with+OODT+RADiX >>>> >>>> *If you already have a deployed OODT FM:* >>>> 1. Follow these directions: >>>> https://cwiki.apache.org/confluence/display/OODT/Solr+File+Manager+Quick+Start+Guide >>>> 2. If the above doesn’t work, then use OODT RADiX to create a FM and >>>> Solr deployment that works, and copy those directories to your currently >>>> deployed production directory. >>>> >>>> Thanks - hope that helps! >>>> Rishi >>>> >>>> On Oct 5, 2014, at 10:56 AM, MengYing Wang <mengyingwa...@gmail.com> >>>> wrote: >>>> >>>> Dear Prof. Mattmann and Rishi, >>>> >>>> Attached is the nutch and solr directories. >>>> >>>> nutch_solr.zip >>>> <https://docs.google.com/file/d/0B7PYVKDpy0jlSnI3U1lFcGY0WnM/edit?usp=drive_web> >>>> >>>> As for problem (6), I could use SolrIndexer instead. Following is my >>>> File Manager directory. >>>> >>>> >>>> https://drive.google.com/file/d/0B7PYVKDpy0jlVTk2NWFFY2sycW8/view?usp=sharing >>>> >>>> Thank you! >>>> >>>> Best, >>>> Mengying Wang >>>> >>>> >>>> >>>> On Sun, Oct 5, 2014 at 9:25 AM, Christian Alan Mattmann < >>>> mattm...@usc.edu> wrote: >>>> >>>>> Thanks Angela. Great work! >>>>> >>>>> Some comments/feedback: >>>>> >>>>> (1) According to >>>>> >>>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Push-Pull+User+Guide >>>>> , >>>>> use the Apache OODT Pushpull to crawl data files from >>>>> a remote server to the local machine [Failed, no data files downloaded >>>>> at >>>>> all]. >>>>> - This problem is not so urgent. Maybe I should use some ftp client >>>>> tools, >>>>> e.g., FileZilla, to download data files in the remote ftp servers. >>>>> >>>>> MY COMMENT: Please send me your PushPull directory zipped up. I will >>>>> take a look - Tyler can you also look? >>>>> >>>>> (3) According to https://wiki.apache.org/nutch/IntranetDocumentSearch, >>>>> use >>>>> the Apache Nutch and Solr to crawl and index local data files [Failed, >>>>> No data is indexed in Solr]. >>>>> - This problem is not so urgent. Maybe this feature >>>>> only works for the Nutch 2.x. My Nutch version is 1.9. Also I could use >>>>> the OODT Crawler to ingest local files. >>>>> >>>>> >>>>> MY COMMENT: Please send me your nutch + solr directories, zipped up. >>>>> I will take a look. >>>>> >>>>> (6) According to >>>>> >>>>> https://cwiki.apache.org/confluence/display/OODT/Solr+File+Manager+Quick+St >>>>> art+Guide, integrate the Apache OODT File Manager >>>>> with the Apache Solr [Failed, No product information available in the >>>>> Solr]. >>>>> - It doesn't work out. However, I could use (5) to integrate OODT File >>>>> Manager and the Solr. >>>>> >>>>> >>>>> MY COMMENT: Rishi, can you guys help Angela with OODT + Solr + FM? >>>>> It¹s not >>>>> working for her. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> Chris Mattmann, Ph.D. >>>>> Adjunct Associate Professor, Computer Science Department >>>>> University of Southern California >>>>> Los Angeles, CA 90089 USA >>>>> Email: mattm...@usc.edu >>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>> >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: MengYing Wang <mengyingwa...@gmail.com> >>>>> Date: Saturday, October 4, 2014 at 9:12 PM >>>>> To: Chris Mattmann <mattm...@usc.edu>, "Mcgibbney, Lewis J (398J)" >>>>> <lewis.j.mcgibb...@jpl.nasa.gov> >>>>> Cc: Annie Bryant <anniebry...@gmail.com>, "Ramirez, Paul M (398J)" >>>>> <paul.m.rami...@jpl.nasa.gov>, Chris Mattmann >>>>> <chris.a.mattm...@jpl.nasa.gov> >>>>> Subject: Directed Research Weekly Report from 2014/09/29 - 2014/10/05 >>>>> >>>>> >Dear Prof. Mattmann, >>>>> > >>>>> > >>>>> >New status of the previous failed problems: >>>>> > >>>>> > >>>>> >(1) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Push-Pull+User+Guide >>>>> >, use the Apache OODT Pushpull to crawl data files from >>>>> > a remote server to the local machine [Failed, no data files >>>>> downloaded >>>>> >at all]. >>>>> > >>>>> > >>>>> >- This problem is not so urgent. Maybe I should use some ftp client >>>>> >tools, e.g., FileZilla, to download data files in the remote ftp >>>>> servers. >>>>> > >>>>> > >>>>> >(2) Use the Apache OODT Pushpull to crawl webpages [Succeed]. >>>>> > >>>>> > >>>>> >(3) According to https://wiki.apache.org/nutch/IntranetDocumentSearch >>>>> , >>>>> >use the Apache Nutch and Solr to crawl and index local data files >>>>> [Failed, >>>>> > No data is indexed in Solr]. >>>>> > >>>>> > >>>>> >- This problem is not so urgent. Maybe this feature >>>>> > only works for the Nutch 2.x. My Nutch version is 1.9. Also I could >>>>> use >>>>> >the OODT Crawler to ingest local files. >>>>> > >>>>> > >>>>> >(4) Integrate the tike parser with the Apache Nutch [Failed, No tike >>>>> >fields available in the Solr]. >>>>> > >>>>> > >>>>> >- Still in progress. >>>>> > >>>>> > >>>>> >(5) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/Using+the+SolrIndexer+to+ >>>>> >dump+a+File+Manager+Catalog, >>>>> > use the SolrIndexer to dump all product information from the Apache >>>>> OODT >>>>> >File Manager to the Apache Solr [Succeed]. >>>>> > >>>>> > >>>>> >(6) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/Solr+File+Manager+Quick+S >>>>> >tart+Guide, integrate the Apache OODT File Manager >>>>> > with the Apache Solr [Failed, No product information available in the >>>>> >Solr]. >>>>> > >>>>> > >>>>> >- It doesn't work out. However, I could use (5) to integrate OODT File >>>>> >Manager and the Solr. >>>>> > >>>>> > >>>>> >So far, I have two ways to crawl remote data and construct indexes in >>>>> the >>>>> >Solr: >>>>> > >>>>> > >>>>> >(1) moving data to the local machine using the FileZilla -> developing >>>>> >metadata extractor using the Tika -> crawling the data directory using >>>>> >the OODT Crawler -> migrating product information to the Solr uing the >>>>> >SolrIndexer >>>>> > >>>>> > >>>>> >(2) crawling websites using the Nutch -> indexing some basic metadata >>>>> in >>>>> >the Solr >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >Thanks. >>>>> > >>>>> > >>>>> >Best, >>>>> >Mengying (Angela) Wang >>>>> > >>>>> >On Mon, Sep 29, 2014 at 12:22 PM, MengYing Wang >>>>> ><mengyingwa...@gmail.com> wrote: >>>>> > >>>>> >Dear Prof. Mattmann, >>>>> > >>>>> > >>>>> >In the previous two weeks, I was trying to solve the following >>>>> problems: >>>>> > >>>>> > >>>>> >(1) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Push-Pull+User+Guide >>>>> >, use the Apache OODT Pushpull to crawl data files from >>>>> > a remote server to the local machine [Failed, couldn't find the data >>>>> >files]. >>>>> > >>>>> > >>>>> >(2) Use the Apache OODT Pushpull to crawl webpages [Failed, HttpClient >>>>> >ClassNotFoundException]. >>>>> > >>>>> > >>>>> >(3) According to https://wiki.apache.org/nutch/IntranetDocumentSearch >>>>> , >>>>> >use the Apache Nutch and Solr to crawl and index local data files >>>>> [Failed, >>>>> > No data files found in Solr]. >>>>> > >>>>> > >>>>> >(4) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/Solr+File+Manager+Quick+S >>>>> >tart+Guide, search and delete redundant products >>>>> > in the Apache OODT File Manager [Succeed]. >>>>> > >>>>> > >>>>> >(5) According to >>>>> >https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help, >>>>> use >>>>> >the Apache OODT Crawler and Tika to extract metadata and then query >>>>> > the metadata in the Apache OODT File Manager [Succeed]. >>>>> > >>>>> > >>>>> >(6) According to https://wiki.apache.org/nutch/IndexMetatags, use the >>>>> >plugins to parse HTML meta tags into separate fields in the Solr index >>>>> >[Succeed]. >>>>> > >>>>> > >>>>> >(7) Integrate the tike parser with the Apache Nutch to extract >>>>> metadata >>>>> >information which would be indexed in the Solr [Failed, No tike fields >>>>> >available in the Solr]. >>>>> > >>>>> > >>>>> >(8) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/Using+the+SolrIndexer+to+ >>>>> >dump+a+File+Manager+Catalog, >>>>> > use the SolrIndexer to dump all product information from the Apache >>>>> OODT >>>>> >File Manager to the Apache Solr [Failed, No product information >>>>> available >>>>> >in the Solr]. >>>>> > >>>>> > >>>>> >(9) According to >>>>> > >>>>> https://cwiki.apache.org/confluence/display/OODT/Solr+File+Manager+Quick+S >>>>> >tart+Guide, integrate the >>>>> > Apache OODT File Manager with the Apache Solr [Failed, No product >>>>> >information available in the Solr]. >>>>> > >>>>> > >>>>> >(10) According to https://lucene.apache.org/solr/4_10_0/tutorial.html >>>>> , >>>>> >explore a simple command line tool for posting, deleting, updating and >>>>> >querying >>>>> > raw XMLs to the solr server [Succeed]. >>>>> > >>>>> > >>>>> >Thank you. >>>>> > >>>>> > >>>>> > >>>>> >Best, >>>>> >Mengying Wang >>>>> > >>>>> > >>>>> >On Wed, Sep 17, 2014 at 11:44 AM, MengYing Wang >>>>> ><mengyingwa...@gmail.com> wrote: >>>>> > >>>>> >Dear Prof. Mattmann, >>>>> > >>>>> > >>>>> >For the last week, I was learning the various apache tool tutorials, >>>>> and >>>>> >trying to figure out how to crawl data files in the web, and then >>>>> build >>>>> >up a metadata index for future queries. So far, I have found the >>>>> >following two approaches: >>>>> > >>>>> > >>>>> >1: Use the Apache OODT Pushpull to crawl a bunch of data files from >>>>> some >>>>> >remote server to localhost -> Use the Apache Tika to extract the >>>>> >metadata information for each data file -> Use the Apache OODT File >>>>> >Manager to ingest the metadata files -> Use >>>>> > the query_tool script to query the metadata information stored in the >>>>> >Apache OODT File Manager >>>>> > >>>>> > >>>>> >We could also achieve the above process by employing the Apache OODT >>>>> >CAS-Curator to automatically call the Apache Tika and the Apache File >>>>> >Manager, for the details you could refer to >>>>> >http://oodt.apache.org/components/maven/curator/user/basic.html >>>>> > >>>>> > >>>>> >2: Use the Apache Nutch to crawl a number of webpages -> Use the >>>>> Apache >>>>> >Solr to do the text queries. >>>>> > >>>>> > >>>>> >However, there are some problems that I am still trying to solve: >>>>> > >>>>> > >>>>> >(1) According to the Apache OODT Pushpull user guide >>>>> >( >>>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Push-Pull+User+Guid >>>>> >e), data files should >>>>> > be downloaded to the staging area. However, when I started the >>>>> pushpull >>>>> >script, I have waited for at least 15 minutes but nothing was >>>>> downloaded. >>>>> >I have checked the remote FTP server, there indeed are some data >>>>> files. >>>>> >-_-! >>>>> > >>>>> > >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> >guest-wireless-207-151-035-013:bin AngelaWang$ ./pushpull >>>>> >TRANSFER: >>>>> >org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory >>>>> >^C >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> > >>>>> >Also, url-downloader script cannot work because of the java >>>>> > NoClassDefFoundError. >>>>> > >>>>> > >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> >guest-wireless-207-151-035-013:bin AngelaWang$ ./url-downloader >>>>> > >>>>> > >>>>> http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT >>>>> >< >>>>> http://pds-imaging.jpl.nasa.gov/data/msl/MSLHAZ_0XXX/CATALOG/CATINFO.TXT >>>>> > >>>>> > . >>>>> >Exception in thread "main" java.lang.NoClassDefFoundError: >>>>> >org/apache/oodt/cas/pushpull/protocol/http/HttpClient >>>>> >Caused by: java.lang.ClassNotFoundException: >>>>> >org.apache.oodt.cas.pushpull.protocol.http.HttpClient >>>>> >at java.net.URLClassLoader$1.run(URLClassLoader.java:202) >>>>> >at java.security.AccessController.doPrivileged(Native Method) >>>>> >at java.net.URLClassLoader.findClass(URLClassLoader.java:190) >>>>> >at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >>>>> >at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >>>>> >at java.lang.ClassLoader.loadClass(ClassLoader.java:247) >>>>> > >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> > >>>>> > >>>>> >2: According to the Apache OODT Crawler Help >>>>> >(https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help), >>>>> the >>>>> >Apache OODT Crawler could be integrated >>>>> > with the Apache Tika. However, there is no >>>>> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor class >>>>> in >>>>> >my Apache OODT Crawler package. >>>>> > >>>>> > >>>>> >3: How to dump the metadata in the Apache OODT File Manager to the >>>>> Apache >>>>> >Solr using the Apache OODT Workflow Manager? I still have no clear >>>>> answer >>>>> >yet. >>>>> > >>>>> > >>>>> >4: According to the Apache Solr Tutorial >>>>> >(https://lucene.apache.org/solr/4_10_0/tutorial.html), users should >>>>> be >>>>> >able to add/delete/update documents using post.jar script. >>>>> > However, it doesn't work in my machine. >>>>> > >>>>> > >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> >guest-wireless-207-151-035-013:exampledocs AngelaWang$ java -jar >>>>> post.jar >>>>> >solr.xml >>>>> >SimplePostTool version 1.5 >>>>> >Posting files to base url >>>>> >http://localhost:8983/solr/update <http://localhost:8983/solr/update >>>>> > >>>>> >using content-type application/xml.. >>>>> >POSTing file solr.xml >>>>> >SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for >>>>> >url: >>>>> >http://localhost:8983/solr/update >>>>> >SimplePostTool: WARNING: Response: <?xml version="1.0" >>>>> encoding="UTF-8"?> >>>>> ><response> >>>>> ><lst name="responseHeader"><int name="status">400</int><int >>>>> >name="QTime">1</int></lst><lst name="error"><str name="msg">ERROR: >>>>> >[doc=SOLR1000] unknown field 'name'</str><int >>>>> name="code">400</int></lst> >>>>> ></response> >>>>> >SimplePostTool: WARNING: IOException while reading response: >>>>> >java.io.IOException: Server returned HTTP response code: 400 for URL: >>>>> >http://localhost:8983/solr/update >>>>> >1 files indexed. >>>>> >COMMITting Solr index changes to >>>>> >http://localhost:8983/solr/update.. >>>>> >Time spent: 0:00:00.032 >>>>> > >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> > >>>>> > >>>>> >Solr logs: >>>>> > >>>>> > >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> >6506114 [qtp1314570047-14] ERROR org.apache.solr.core.SolrCore >>>>> >org.apache.solr.common.SolrException: ERROR: [doc=SOLR1000] unknown >>>>> field >>>>> >'name' >>>>> >at >>>>> >>>>> >org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:185 >>>>> >) >>>>> >at >>>>> >>>>> >org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand >>>>> >.java:78) >>>>> >at >>>>> >>>>> >org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.j >>>>> >ava:238) >>>>> >at >>>>> >>>>> >org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.ja >>>>> >va:164) >>>>> >at >>>>> >>>>> >org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdatePr >>>>> >ocessorFactory.java:69) >>>>> >....... >>>>> >>>>> >************************************************************************** >>>>> >*********** >>>>> > >>>>> > >>>>> > >>>>> >I will continue to solve the above problems this week, and could we >>>>> >discuss the two approaches this Thursday after the class? Many thanks! >>>>> >Have a good day! >>>>> > >>>>> > >>>>> >Best, >>>>> >Mengying (Angela) Wang >>>>> > >>>>> > >>>>> > >>>>> >On Mon, Sep 8, 2014 at 10:32 PM, MengYing Wang >>>>> ><mengyingwa...@gmail.com> wrote: >>>>> > >>>>> >Dear Prof. Mattmann, >>>>> > >>>>> > >>>>> >For the previous week, I have successfully installed the following >>>>> >softwares in my personal computer: >>>>> > >>>>> > >>>>> >1: Apache OODT Catalog and Archive File Management Component: >>>>> >http://oodt.apache.org/components/maven/filemgr/user/basic.html >>>>> >2: Apache OODT Catalog and Archive Crawling Framework: >>>>> >http://oodt.apache.org/components/maven/crawler/user/ >>>>> > >>>>> >3: Apache OODT Catalog and Archive Workflow Management Component: >>>>> >http://oodt.apache.org/components/maven/workflow/user/basic.html >>>>> > >>>>> >4: Apache Solr: >>>>> >https://cwiki.apache.org/confluence/display/solr/Installing+Solr >>>>> >5: Apache Nutch: >>>>> > >>>>> http://wiki.apache.org/nutch/NutchTutorial#A3._Crawl_your_first_website >>>>> >6: Apache Tika: http://tika.apache.org/0.9/gettingstarted.html >>>>> > >>>>> > >>>>> >This week I will continue to playing with these softwares to figure >>>>> out >>>>> >the following three questions: >>>>> >(1) how to get the metadata using Apache OODT or Apache Nutch? >>>>> >(2) how to dump the metadata from Apache OODT to Apache Solr? >>>>> >(3) how to query the metadata stored in Solr? >>>>> > >>>>> >Best, >>>>> >Mengying (Angela) Wang >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >-- >>>>> >Best, >>>>> >Mengying (Angela) Wang >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >-- >>>>> >Best, >>>>> >Mengying (Angela) Wang >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >-- >>>>> >Best, >>>>> >Mengying (Angela) Wang >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best, >>>> Mengying (Angela) Wang >>>> >>>> >>>> --- >>>> Rishi Verma >>>> NASA Jet Propulsion Laboratory >>>> California Institute of Technology >>>> 4800 Oak Grove Drive, M/S 158-248 >>>> Pasadena, CA 91109 >>>> Tel: 1-818-393-5826 >>>> >>>> >>> >>> >>> -- >>> Best, >>> Mengying (Angela) Wang >>> >>> >>> --- >>> Rishi Verma >>> NASA Jet Propulsion Laboratory >>> California Institute of Technology >>> >>> >> >> >> -- >> Best, >> Mengying (Angela) Wang >> > > > > -- > Best, > Mengying (Angela) Wang > > > --- > Rishi Verma > NASA Jet Propulsion Laboratory > California Institute of Technology > > -- Best, Mengying (Angela) Wang -- Best, Mengying (Angela) Wang