Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Gal Nitzan: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ Well, you can not! However, you have two choices to proceed: 1) Recover the pages already fetched and than restart the fetcher. + - * You'll need to create a dummy file called fetcher.done in the segment directory, updatedb, generate and restart the fetcher. + You'll need to create a dummy file called fetcher.done in the segment directory, updatedb, generate and restart the fetcher. - Assuming your index is at /index + Assuming your index is at /index {{{ % touch /index/segments/2005somesegment/fetcher.done + % bin/nutch updatedb /index/db/ /index/segments/2005somesegment/ + % bin/nutch generate /index/db/ /index/segments/2005somesegment/ + % bin/nutch fetch /index/segments/2005somesegment}}} All the pages that were not crawled will be re-generated for fetch. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. 2) Discard the aborted output. + - * Delete all folders from the segment folder except the fetchlist folder and restart the fetcher. + Delete all folders from the segment folder except the fetchlist folder and restart the fetcher. ==== Who changes the next fetch date? ==== * After injecting a new url the next fetch date is set to the current time.
