Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Gal Nitzan: http://wiki.apache.org/nutch/FAQ ------------------------------------------------------------------------------ ==== How can I recover an aborted fetch process? ==== - Well, you can not! However, you have two choices to proceed: + Well, you can not! '''However, you have two choices to proceed:''' 1) Recover the pages already fetched and than restart the fetcher. You'll need to create a dummy file called fetcher.done in the segment directory, updatedb, generate and restart the fetcher. Assuming your index is at /index - {{{ % touch /index/segments/2005somesegment/fetcher.done + {{{ % touch /index/segments/2005somesegment/fetcher.done % bin/nutch updatedb /index/db/ /index/segments/2005somesegment/ @@ -87, +87 @@ % bin/nutch fetch /index/segments/2005somesegment}}} - All the pages that were not crawled will be re-generated for fetch. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. + All the pages that were not crawled will be re-generated for fetch. If you fetched lots of pages, and don't want to have to re-fetch them again, this is the best way. 2) Discard the aborted output. - Delete all folders from the segment folder except the fetchlist folder and restart the fetcher. + Delete all folders from the segment folder except the fetchlist folder and restart the fetcher. ==== Who changes the next fetch date? ==== * After injecting a new url the next fetch date is set to the current time.
