Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by AledJones: http://wiki.apache.org/nutch/GettingNutchRunningWithWindows ------------------------------------------------------------------------------ Create an empty text file in your nutch directory e.g. "urls" and add the urls of the sites you want to crawl as shown in the tutorial. Load up cygwin and naviagte to your nutch directory. When cygwin launches you'll usually find yourself in your user folder (e.g. C:\Documents and Settings\username). + + If your workstation needs to go through a windows authentication proxy to get to the internet then you can use an application such as the NTLM Authorization Proxy Server: [http://www.geocities.com/rozmanov/ntlm/] to get through it. You'll then need to edit the nutch-site.xml file to point to the port opened by the app. == Intranet Crawling ==
