Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JakeVanderdray: http://wiki.apache.org/nutch/FAQ The comment on the change is: Just some formatting. ------------------------------------------------------------------------------ % cp nutch-0.7.war $CATALINA_HOME/webapps/ROOT.war * After building your first index, start Tomcat from the index folder. - Assuming your index is located at /index/db/ + Assuming your index is located at /index/db/: - % cd /index/db/ + {{{% cd /index/db/ - % $CATATALINA_HOME/bin/startup.sh + % $CATATALINA_HOME/bin/startup.sh}}} * After building your first index, start Tomcat from the index folder. Start Tomcat - % $CATATALINA_HOME/bin/startup.sh + % $CATATALINA_HOME/bin/startup.sh Stop Tomcat - % $CATATALINA_HOME/bin/startup.sh + % $CATATALINA_HOME/bin/startup.sh Tomcat has extracted the contens of the ROOT.war file Edit the nutch-default.xml which is located at: $CATATALINA_HOME/bin/webapps/ROOT/WEB-INF/classes/ @@ -59, +59 @@ ==== How can I recover an aborted fetch process? ==== You have two choices: - 1) Use the aborted output. You'll need to touch the file fetcher.done in the segment directory. All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. + 1. Use the aborted output. You'll need to touch the file fetcher.done in the segment directory. All the pages that were not crawled will be re-generated for fetch pretty soon. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. - 2) Discard the aborted output. To do this, just delete the fetcher* directories in the segment and restart the fetcher. + 2. Discard the aborted output. To do this, just delete the fetcher* directories in the segment and restart the fetcher. ==== Who changes the next fetch date? ==== * After injecting a new url the next fetch date is set to the current time.
