Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "CrawlDatumStates" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/CrawlDatumStates?action=diff&rev1=3&rev2=4 Nutch 1.x maintains state of pages in CrawlDb, which is updated by various tools: - * Injector - to populate CrawlDb with new URLs + *Injector - to populate CrawlDb with new URLs - * Generator - to generate new fetchlists, and optionally mark those URLs in CrawlDb as "being in the process of fetching" + *Generator - to generate new fetchlists, and optionally mark those URLs in CrawlDb as "being in the process of fetching" - * CrawlDb update - to update the CrawlDb with new knowledge about the already known URLs (already in CrawlDb) as well as add new URLs discovered from page outlinks. + *CrawlDb update - to update the CrawlDb with new knowledge about the already known URLs (already in CrawlDb) as well as add new URLs discovered from page outlinks. Below is a state diagram of CrawlDatum, which is a class that holds this state in CrawlDb.

