Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by Gal Nitzan:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
  
  ==== How can I recover an aborted fetch process? ====
  
-    Well, you can not! However, you have two choices to proceed:
+ Well, you can not. '''However, you have two choices to proceed''':
  
-    1) Recover the pages already fetched and than restart the fetcher.
+   1) Recover the pages already fetched and than restart the fetcher.
  
-       You'll need to create a dummy file called fetcher.done in the segment 
directory, updatedb, generate and restart the fetcher.
+       You'll need to create a file '''fetcher.done''' in the segment 
directory an than: updatedb, generate and fetch.
        Assuming your index is at /index
        {{{ % touch /index/segments/2005somesegment/fetcher.done
  
@@ -90, +90 @@

  
        All the pages that were not crawled will be re-generated for fetch. If 
you fetched lots of pages, and don't want to have to re-fetch them again, this 
is the best way.
  
-    2) Discard the aborted output.
+   2) Discard the aborted output.
        
        Delete all folders from the segment folder except the fetchlist folder 
and restart the fetcher.
  
  ==== Who changes the next fetch date? ====
+ 
    * After injecting a new url the next fetch date is set to the current time.
    * Generating a fetchlist enhances the date by 7 days.
    * Updating the db sets the date to the current time + 
db.default.fetch.interval - 7 days.
  
  ==== I have a big fetchlist in my segments folder. How can I fetch only some 
sites at a time? ====
+ 
    * You have to decide how many pages you want to crawl before generating 
segments and use the options of bin/nutch generate.
    * Use -topN to limit the amount of pages all together.
    * Use -numFetchers to generate multiple small segments.

Reply via email to