Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "NutchTutorial" page has been changed by WayneBurke: https://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=71&rev2=72 Comment: typos corrected * `ant clean` will remove this directory (keep copies of modified config files) == 2. Verify your Nutch installation == - * run "`bin/nutch`" - You can confirm a correct installation if you seeing similar to the following: + * run "`bin/nutch`" - You can confirm a correct installation if you see something similar to the following: {{{ Usage: nutch COMMAND where command is one of: @@ -154, +154 @@ === 3.4 Using Individual Commands for Whole-Web Crawling === '''NOTE''': If you previously modified the file `conf/regex-urlfilter.txt` as covered [[#A3._Crawl_your_first_website|here]] you will need to change it back. - Whole-Web crawling is designed to handle very large crawls which may take weeks to complete, running on multiple machines. This also permits more control over the crawl process, and incremental crawling. It is important to note that whole Web crawling does not necessarily mean crawling the entire World Wide Web. We can limit a whole Web crawl to just a list of the URLs we want to crawl. This is done by using a filter just like we the one we used when we did the `crawl` command (above). + Whole-Web crawling is designed to handle very large crawls which may take weeks to complete, running on multiple machines. This also permits more control over the crawl process, and incremental crawling. It is important to note that whole Web crawling does not necessarily mean crawling the entire World Wide Web. We can limit a whole Web crawl to just a list of the URLs we want to crawl. This is done by using a filter just like the one we used when we did the `crawl` command (above). ==== Step-by-Step: Concepts ==== Nutch data is composed of: @@ -260, +260 @@ ==== Step-by-Step: Indexing into Apache Solr ==== Note: For this step you should have Solr installation. If you didn't integrate Nutch with Solr. You should read [[#A4._Setup_Solr_for_search|here]]. - Now we are ready!!! To go on and index the all the resources. For more information see [[http://wiki.apache.org/nutch/bin/nutch%20solrindex|this paper]] + Now we are ready to go on and index all the resources. For more information see [[http://wiki.apache.org/nutch/bin/nutch%20solrindex|this paper]] {{{ Usage: bin/nutch solrindex <solr url> <crawldb> [-linkdb <linkdb>][-params k1=v1&k2=v2...] (<segment> ...| -dir <segments>) [-noCommit] [-deleteGone] [-filter] [-normalize]

