Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by XueyongZhi:
http://wiki.apache.org/nutch/IntranetRecrawl

------------------------------------------------------------------------------
  [[TableOfContents]]
+ 
+ '''NOTE: this scripts listed here do not do recrawl correctly. It will add 
additional depth (specified by user) to a crawl. To avoid this, we need to use 
'-noAdditions' options to 'updatedb' command. But an annoying problem is that 
if you have used the 'crawl' command, newly discovered url have been added to 
the crawldb and will be fetched with the next 'fecth' command.
+ 
+ So the problem is, you will seen more pages being crawled using this recrawl 
script, not just the pages you have fecthed.'''
  
  Here are a couple of scripts for recrawling your Intranet.
  

Reply via email to