[Nutch Wiki] Update of "FAQ" by Gal Nitzan

Apache Wiki Thu, 22 Sep 2005 23:32:56 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The following page has been changed by Gal Nitzan:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
    First you need to copy the .WAR file to the servlet container webapps 
folder.
       % cp nutch-0.7.war $CATALINA_HOME/webapps/ROOT.war
  
-   * After building your first index, start Tomcat from the index folder.
+   1) After building your first index, start Tomcat from the index folder.
-     Assuming your index is located at /index/db/:
+     Assuming your index is located at /index :
-     {{{% cd /index/db/
+     {{{% cd /index/
  % $CATATALINA_HOME/bin/startup.sh}}}
-   * After building your first index, start Tomcat from the index folder.
-     Start Tomcat
+     Now you can search.
+   2) After building your first index, start Tomcat and stop Tomcat which will 
make Tomcate extrat the Nutch webapp. Than you need to edit the nutch-site.xml 
and put in it the location of the index folder.
-       % $CATATALINA_HOME/bin/startup.sh
+     {{{% $CATATALINA_HOME/bin/startup.sh
-     Stop Tomcat
-       % $CATATALINA_HOME/bin/startup.sh
+ % $CATATALINA_HOME/bin/startup.sh}}}
-     Tomcat has extracted the contens of the ROOT.war file
-     Edit the nutch-default.xml which is located at:
-        $CATATALINA_HOME/bin/webapps/ROOT/WEB-INF/classes/
-        look for the entry: searcher.dir and replace it with your index 
location /index/db
  
+     {{{% vi $CATATALINA_HOME/bin/webapps/ROOT/WEB-INF/classes/nutch-site.xml
+ <?xml version="1.0"?>
+ <?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?>
+ 
+ <!-- Do not modify this file directly.  Instead, copy entries that you -->
+ <!-- wish to modify from this file into nutch-site.xml and change them -->
+ <!-- there.  If nutch-site.xml does not already exist, create it.      -->
+ 
+ <nutch-conf>
+ 
+ <property>
+   <name>searcher.dir</name>
+   <value>/your_index_folder_path</value>
+ </property>
+ 
+ </nutch-conf>}}}
  ==== I have two XML files, nutch-default.xml and nutch-site.xml, why? ====
  
  nutch-default.xml is the out of the box configuration for nutch. Most 
configuration can (and should unless you know what your doing) stay as it is.
@@ -62, +73 @@

  
  ==== How can I recover an aborted fetch process? ====
  
- You have two choices:
-    1) Use the aborted output.
+ Well, you can not! However, you have two choices to proceed:
+    1) Recover the pages already fetched and than restart the fetcher.
-       * You'll need to touch the file fetcher.done in the segment directory. 
All the pages that were not crawled will be re-generated for fetch pretty soon. 
If you fetched lots of pages, and don't want to have to re-fetch them again, 
this is the best way.
+       * You'll need to create a dummy file called fetcher.done in the segment 
directory. % touch index/yoursegdir/fetcher.done . All the pages that were not 
crawled will be re-generated for fetch pretty soon. If you fetched lots of 
pages, and don't want to have to re-fetch them again, this is the best way.
     2) Discard the aborted output.
        * Delete all folders from the segment folder except the fetchlist 
folder and restart the fetcher.

[Nutch Wiki] Update of "FAQ" by Gal Nitzan

Reply via email to