[Nutch Wiki] Update of "Tutorial on incremental crawling" by Gabriele Kahlout

Apache Wiki Sun, 27 Mar 2011 06:24:40 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "Tutorial on incremental crawling" page has been changed by Gabriele 
Kahlout.
http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling?action=diff&rev1=8&rev2=9

--------------------------------------------------

  The following scripts crawl the whole-web incrementally; Specifying a list of 
urls to crawl, nutch will continuously fetch $it_size urls from a specified 
list of urls, index and merge them with our whole-web index,  so that they can 
be immediately searched, until all urls have been fetched.
  
- Tested with Nutch-1.2 release. Please report any bug you find on the mailing 
list and to me [[Gabriele Kahlout|me]].
+ Tested with Nutch-1.2 release [[Incremental Crawling Scripts Test][Output]. 
Please report any bug you find on the mailing list and to [[Gabriele 
Kahlout|me]].
+ 
  
  If not ready, follow [[Tutorial]] to setup and configure Nutch on your 
machine.
  
@@ -57, +58 @@

        while [[ $i -lt $depth ]]
        do              
                cmd="bin/nutch generate $it_crawldb crawl/segments -topN 
$it_size"
-               $cmd
                output=`$cmd`
                if [[ $output == *'0 records selected for fetching'* ]]
                then
@@ -157, +157 @@

                echo
                cmd="bin/nutch generate $it_crawldb crawl/segments -topN 
$it_size"
                echo $cmd
-               $cmd
                output=`$cmd`
                echo $output
                if [[ $output == *'0 records selected for fetching'* ]] #all 
the urls of this iteration have been fetched

[Nutch Wiki] Update of "Tutorial on incremental crawling" by Gabriele Kahlout

Reply via email to