[Nutch Wiki] Update of "Tutorial on incremental crawling" by Gabriele Kahlout

Apache Wiki Sun, 27 Mar 2011 06:28:23 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "Tutorial on incremental crawling" page has been changed by Gabriele 
Kahlout.
http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling?action=diff&rev1=9&rev2=10

--------------------------------------------------

- The following scripts crawl the whole-web incrementally; Specifying a list of 
urls to crawl, nutch will continuously fetch $it_size urls from a specified 
list of urls, index and merge them with our whole-web index,  so that they can 
be immediately searched, until all urls have been fetched.
+ The following scripts crawl the whole-web incrementally;
  
+ Input:
+ a list of urls to crawl
+ 
+ Output: 
+ Nutch will continuously fetch $it_size urls from the input list, index and 
merge them with the whole-web index (so that they can be immediately searched) 
until all urls have been fetched.
+ 
+ 
- Tested with Nutch-1.2 release [[Incremental Crawling Scripts Test][Output]. 
Please report any bug you find on the mailing list and to [[Gabriele 
Kahlout|me]].
+ Tested with Nutch-1.2 release [[Incremental Crawling Scripts Test|Output]]. 
Please report any bug you find on the mailing list and to [[Gabriele 
Kahlout|me]].
+ If you don't have Nutch up and running, follow [[Tutorial]].
  
+ Script Editions:
- 
- If not ready, follow [[Tutorial]] to setup and configure Nutch on your 
machine.
- 
- Follow 2 script:
  
  1. Abridged script using Solr;
  2. Unabridged script with explanations and using nutch index.

[Nutch Wiki] Update of "Tutorial on incremental crawling" by Gabriele Kahlout

Reply via email to