Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "Tutorial on incremental crawling" page has been changed by Gabriele Kahlout. http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling?action=diff&rev1=8&rev2=9 -------------------------------------------------- The following scripts crawl the whole-web incrementally; Specifying a list of urls to crawl, nutch will continuously fetch $it_size urls from a specified list of urls, index and merge them with our whole-web index, so that they can be immediately searched, until all urls have been fetched. - Tested with Nutch-1.2 release. Please report any bug you find on the mailing list and to me [[Gabriele Kahlout|me]]. + Tested with Nutch-1.2 release [[Incremental Crawling Scripts Test][Output]. Please report any bug you find on the mailing list and to [[Gabriele Kahlout|me]]. + If not ready, follow [[Tutorial]] to setup and configure Nutch on your machine. @@ -57, +58 @@ while [[ $i -lt $depth ]] do cmd="bin/nutch generate $it_crawldb crawl/segments -topN $it_size" - $cmd output=`$cmd` if [[ $output == *'0 records selected for fetching'* ]] then @@ -157, +157 @@ echo cmd="bin/nutch generate $it_crawldb crawl/segments -topN $it_size" echo $cmd - $cmd output=`$cmd` echo $output if [[ $output == *'0 records selected for fetching'* ]] #all the urls of this iteration have been fetched

