Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JakeVanderdray: http://wiki.apache.org/nutch/NutchTutorial ------------------------------------------------------------------------------ Now we're ready to crawl. There are two approaches to crawling: 1. Using the '''crawl''' command to perform all the crawl steps with a single command. This is sometimes referred to as '''Intranet Crawling'''. Although a simple way to get started, it has limitations. - 2. Using the lower level inject, generate, fetch and updatedb commands. Sometimes refferred to as '''Whole-Web Crawling''' this allows you more control of each step of the process and is required to be able to update existing data. + 2. Using the lower level inject, generate, fetch and updatedb commands. Sometimes referred to as '''Whole-Web Crawling''' this allows you more control of each step of the process and is required to be able to update existing data. == The Crawl Command == - The crawl comamnd is more appropriate when you intend to crawl up to around one million pages on a handful of web servers. + The crawl command is more appropriate when you intend to crawl up to around one million pages on a handful of web servers. === Crawl Command: Configuration ===
