Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by Gal Nitzan:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  Well, you can not! However, you have two choices to proceed:
     1) Recover the pages already fetched and than restart the fetcher.
+ 
-       * You'll need to create a dummy file called fetcher.done in the segment 
directory, updatedb, generate and restart the fetcher.
+       You'll need to create a dummy file called fetcher.done in the segment 
directory, updatedb, generate and restart the fetcher.
-         Assuming your index is at /index
+       Assuming your index is at /index
          {{{ % touch /index/segments/2005somesegment/fetcher.done
+ 
  % bin/nutch updatedb /index/db/ /index/segments/2005somesegment/
+ 
  % bin/nutch generate /index/db/ /index/segments/2005somesegment/
+ 
  % bin/nutch fetch /index/segments/2005somesegment}}}
  
  All the pages that were not crawled will be re-generated for fetch. If you 
fetched lots of pages, and don't want to have to re-fetch them again, this is 
the best way.
  
     2) Discard the aborted output.
+       
-       * Delete all folders from the segment folder except the fetchlist 
folder and restart the fetcher.
+ Delete all folders from the segment folder except the fetchlist folder and 
restart the fetcher.
  
  ==== Who changes the next fetch date? ====
    * After injecting a new url the next fetch date is set to the current time.

Reply via email to