Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by AledJones:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows

The comment on the change is:
 

------------------------------------------------------------------------------
  
  You'll need Tomcat 4.* or higher running on your machine.
  
- == Crawling ==
+ == Setup ==
  
  Download the release and extract anywhere on your hard disk e.g. 
c:\nutch-7.0.1
  
  Create an empty text file in your nutch directory e.g. "urls" and add the 
urls of the sites you want to crawl as shown in the tutorial.
  
  Load up cygwin and naviagte to your nutch directory.  When cygwin launches 
you'll usually find yourself in your user folder (e.g. C:\Documents and 
Settings\username).
+ 
+ == Intranet Crawling ==
  
  Follow the tutorial instructions to begin the crawl by entering commands in 
cygwin. Depending on the commands you enter Nutch should create a crawl 
directory and a log file.
  
@@ -32, +34 @@

  bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log
  }}}
  then a folder called crawled is created in your nutch directory, along with 
the crawl.log file.  Use this log file to debug any errors you might have.  
From my experience you'll need to delete the crawl.log file before starting the 
crawl off again.
- 
  
  == Serving ==
  

Reply via email to