Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "Tutorial on incremental crawling" page has been changed by Gabriele Kahlout. http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling?action=diff&rev1=12&rev2=13 -------------------------------------------------- - The following scripts crawl the whole-web incrementally; + The following script does whole-web-crawling incrementally. - Input: a list of urls to crawl + '''Input''': a list of urls to crawl - Output: Nutch will continuously fetch $it_size urls from the input list, index and merge them with the whole-web index (so that they can be immediately searched) until all urls have been fetched. + '''Output''': Nutch will continuously fetch $it_size urls from the input list, index and merge them with the whole-web index (so that they can be immediately searched) until all urls have been fetched. - Tested with Nutch-1.2 release (see [[Incremental Crawling Scripts Test|tests output]]); If you don't have Nutch up and running, follow [[Tutorial]] + Tested with Nutch-1.2 release (see [[Incremental Crawling Scripts Test|tests output]]); If you don't have Nutch up and running, follow [[NutchTutorial|this tutorial]]. === Script Editions: === 1. Abridged using Solr (tersest)

