Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JimboJw:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
      </property>
  
  Now you can invoke the crawler and index all or part of your disk. The only 
remaining gotcha is that if you use Mozilla it will '''not''' load file: URLs 
from a web paged fetched with http, so if you test with the Nutch web container 
running in Tomcat, annoyingly, as you click on results nothing will happen as 
Mozilla by default does not load file URLs. This is mentioned 
[http://www.mozilla.org/quality/networking/testing/filetests.html here] and 
this behavior may be disabled by a 
[http://www.mozilla.org/quality/networking/docs/netprefs.html preference] (see 
security.checkloaduri). IE5 does not have this problem.
+ 
+ ==== How do I index remote file shares? ====
+ 
+ At the current time, Nutch does not have built in support for accessing files 
over SMB (Windows) shares.  This means the only available method is to mount 
the shares yourself, then index the contents as though they were local 
directories (see above).
+ 
+ Note that the share mounting method suffers from the following drawbacks:
+ # The links generated by Nutch will not work except for queries from 
localhost (end users typically won't have the exact same shares mounted in the 
exact same way).
+ # You are limited to the number of mounted shares your operating system 
supports.  In *nix environments, this is effectively unlimited, but in Windows 
you may mount 26 (one share or drive per letter in the English alphabet)
+ # Documents with links to shares are unlikely to work since they won't link 
to the share on your machine, but rather to the SMB version.
  
  ==== While indexing documents, I get the following error: ====
  

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to