[Nutch Wiki] Trivial Update of "NutchTutorial" by WayneBurke

Apache Wiki Wed, 08 Oct 2014 13:24:11 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.


The "NutchTutorial" page has been changed by WayneBurke:
https://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=71&rev2=72

Comment:
typos corrected

   * `ant clean` will remove this directory (keep copies of modified config 
files)
  
  == 2. Verify your Nutch installation ==
-  * run "`bin/nutch`" - You can confirm a correct installation if you seeing 
similar to the following:
+  * run "`bin/nutch`" - You can confirm a correct installation if you see 
something similar to the following:
  
  {{{
  Usage: nutch COMMAND where command is one of:
@@ -154, +154 @@

  === 3.4 Using Individual Commands for Whole-Web Crawling ===
  '''NOTE''': If you previously modified the file `conf/regex-urlfilter.txt` as 
covered [[#A3._Crawl_your_first_website|here]] you will need to change it back.
  
- Whole-Web crawling is designed to handle very large crawls which may take 
weeks to complete, running on multiple machines.  This also permits more 
control over the crawl process, and incremental crawling.  It is important to 
note that whole Web crawling does not necessarily mean crawling the entire 
World Wide Web.  We can limit a whole Web crawl to just a list of the URLs we 
want to crawl.  This is done by using a filter just like we the one we used 
when we did the `crawl` command (above).
+ Whole-Web crawling is designed to handle very large crawls which may take 
weeks to complete, running on multiple machines.  This also permits more 
control over the crawl process, and incremental crawling.  It is important to 
note that whole Web crawling does not necessarily mean crawling the entire 
World Wide Web.  We can limit a whole Web crawl to just a list of the URLs we 
want to crawl.  This is done by using a filter just like the one we used when 
we did the `crawl` command (above).
  
  ==== Step-by-Step: Concepts ====
  Nutch data is composed of:
@@ -260, +260 @@

  ==== Step-by-Step: Indexing into Apache Solr ====
  Note: For this step you should have Solr installation. If you didn't 
integrate Nutch with Solr. You should read [[#A4._Setup_Solr_for_search|here]].
  
- Now we are ready!!! To go on and index the all the resources. For more 
information see [[http://wiki.apache.org/nutch/bin/nutch%20solrindex|this 
paper]]
+ Now we are ready to go on and index all the resources. For more information 
see [[http://wiki.apache.org/nutch/bin/nutch%20solrindex|this paper]]
  
  {{{
       Usage: bin/nutch solrindex <solr url> <crawldb> [-linkdb 
<linkdb>][-params k1=v1&k2=v2...] (<segment> ...| -dir <segments>) [-noCommit] 
[-deleteGone] [-filter] [-normalize]

[Nutch Wiki] Trivial Update of "NutchTutorial" by WayneBurke

Reply via email to