Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by AledJones:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows

------------------------------------------------------------------------------
  Create an empty text file in your nutch directory e.g. "urls" and add the 
urls of the sites you want to crawl as shown in the tutorial.
  
  Load up cygwin and naviagte to your nutch directory.  When cygwin launches 
you'll usually find yourself in your user folder (e.g. C:\Documents and 
Settings\username).
+ 
+ If your workstation needs to go through a windows authentication proxy to get 
to the internet then you can use an application such as the NTLM Authorization 
Proxy Server: [http://www.geocities.com/rozmanov/ntlm/] to get through it.  
You'll then need to edit the nutch-site.xml file to point to the port opened by 
the app.
  
  == Intranet Crawling ==
  

Reply via email to