Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "Tutorial on incremental crawling" page has been changed by Gabriele Kahlout. http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling?action=diff&rev1=9&rev2=10 -------------------------------------------------- - The following scripts crawl the whole-web incrementally; Specifying a list of urls to crawl, nutch will continuously fetch $it_size urls from a specified list of urls, index and merge them with our whole-web index, so that they can be immediately searched, until all urls have been fetched. + The following scripts crawl the whole-web incrementally; + Input: + a list of urls to crawl + + Output: + Nutch will continuously fetch $it_size urls from the input list, index and merge them with the whole-web index (so that they can be immediately searched) until all urls have been fetched. + + - Tested with Nutch-1.2 release [[Incremental Crawling Scripts Test][Output]. Please report any bug you find on the mailing list and to [[Gabriele Kahlout|me]]. + Tested with Nutch-1.2 release [[Incremental Crawling Scripts Test|Output]]. Please report any bug you find on the mailing list and to [[Gabriele Kahlout|me]]. + If you don't have Nutch up and running, follow [[Tutorial]]. + Script Editions: - - If not ready, follow [[Tutorial]] to setup and configure Nutch on your machine. - - Follow 2 script: 1. Abridged script using Solr; 2. Unabridged script with explanations and using nutch index.

