Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by AledJones: http://wiki.apache.org/nutch/GettingNutchRunningWithWindows The comment on the change is: ------------------------------------------------------------------------------ You'll need Tomcat 4.* or higher running on your machine. - == Crawling == + == Setup == Download the release and extract anywhere on your hard disk e.g. c:\nutch-7.0.1 Create an empty text file in your nutch directory e.g. "urls" and add the urls of the sites you want to crawl as shown in the tutorial. Load up cygwin and naviagte to your nutch directory. When cygwin launches you'll usually find yourself in your user folder (e.g. C:\Documents and Settings\username). + + == Intranet Crawling == Follow the tutorial instructions to begin the crawl by entering commands in cygwin. Depending on the commands you enter Nutch should create a crawl directory and a log file. @@ -32, +34 @@ bin/nutch crawl urls -dir crawled -depth 3 >& crawl.log }}} then a folder called crawled is created in your nutch directory, along with the crawl.log file. Use this log file to debug any errors you might have. From my experience you'll need to delete the crawl.log file before starting the crawl off again. - == Serving ==
