Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "CrawlDatumStates" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/CrawlDatumStates?action=diff&rev1=3&rev2=4

  
  Nutch 1.x maintains state of pages in CrawlDb, which is updated by various 
tools:
  
-  * Injector - to populate CrawlDb with new URLs 
+  *Injector - to populate CrawlDb with new URLs 
-  * Generator - to generate new fetchlists, and optionally mark those URLs in 
CrawlDb as "being in the process of fetching" 
+  *Generator - to generate new fetchlists, and optionally mark those URLs in 
CrawlDb as "being in the process of fetching" 
-  * CrawlDb update - to update the CrawlDb with new knowledge about the 
already known URLs (already in CrawlDb) as well as add new URLs discovered from 
page outlinks.
+  *CrawlDb update - to update the CrawlDb with new knowledge about the already 
known URLs (already in CrawlDb) as well as add new URLs discovered from page 
outlinks.
  
  Below is a state diagram of CrawlDatum, which is a class that holds this 
state in CrawlDb.
  

Reply via email to