Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Gal Nitzan: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ First you need to copy the .WAR file to the servlet container webapps folder. % cp nutch-0.7.war $CATALINA_HOME/webapps/ROOT.war - * After building your first index, start Tomcat from the index folder. + 1) After building your first index, start Tomcat from the index folder. - Assuming your index is located at /index/db/: + Assuming your index is located at /index : - {{{% cd /index/db/ + {{{% cd /index/ % $CATATALINA_HOME/bin/startup.sh}}} - * After building your first index, start Tomcat from the index folder. - Start Tomcat + Now you can search. + 2) After building your first index, start Tomcat and stop Tomcat which will make Tomcate extrat the Nutch webapp. Than you need to edit the nutch-site.xml and put in it the location of the index folder. - % $CATATALINA_HOME/bin/startup.sh + {{{% $CATATALINA_HOME/bin/startup.sh - Stop Tomcat - % $CATATALINA_HOME/bin/startup.sh + % $CATATALINA_HOME/bin/startup.sh}}} - Tomcat has extracted the contens of the ROOT.war file - Edit the nutch-default.xml which is located at: - $CATATALINA_HOME/bin/webapps/ROOT/WEB-INF/classes/ - look for the entry: searcher.dir and replace it with your index location /index/db + {{{% vi $CATATALINA_HOME/bin/webapps/ROOT/WEB-INF/classes/nutch-site.xml + <?xml version="1.0"?> + <?xml-stylesheet type="text/xsl" href="nutch-conf.xsl"?> + + <!-- Do not modify this file directly. Instead, copy entries that you --> + <!-- wish to modify from this file into nutch-site.xml and change them --> + <!-- there. If nutch-site.xml does not already exist, create it. --> + + <nutch-conf> + + <property> + <name>searcher.dir</name> + <value>/your_index_folder_path</value> + </property> + + </nutch-conf>}}} ==== I have two XML files, nutch-default.xml and nutch-site.xml, why? ==== nutch-default.xml is the out of the box configuration for nutch. Most configuration can (and should unless you know what your doing) stay as it is. @@ -62, +73 @@ ==== How can I recover an aborted fetch process? ==== - You have two choices: - 1) Use the aborted output. + Well, you can not! However, you have two choices to proceed: + 1) Recover the pages already fetched and than restart the fetcher. - * You'll need to touch the file fetcher.done in the segment directory. All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. + * You'll need to create a dummy file called fetcher.done in the segment directory. % touch index/yoursegdir/fetcher.done . All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. 2) Discard the aborted output. * Delete all folders from the segment folder except the fetchlist folder and restart the fetcher.
