Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "IntranetDocumentSearch" page has been changed by MichaelAlleblas:
http://wiki.apache.org/nutch/IntranetDocumentSearch

New page:
This wiki is to hopefully get others an easier start into indexing and 
searching local intranet documents typically found in an enterprise file share. 
These would include Microsoft Office and PDF documents, text files and digital 
assets.

It draws upon various and sparse sources of information found online on the 
topic and will try to make their suggestions and changes related to later 
versions of the required software.

= Pre-requisites and Assumptions =
This tutorial assumes you are using the following software and configurations

 * Apache Nutch 1.3
 * Apache Solr 3.4.0
 * Solr server will be addressed at http://localhost:8983/solr

Other versions may or may not work and have not been tested by myself.

= Configuration =
== Apache Solr ==
Apache Solr configuration will not be covered here in depth. However, there are 
some things that should be noted when setting up Solr to receive data from a 
Nutch crawler. There is a ''schema.xml'' file located in the Nutch ''conf 
''directory which contains a Solr schema that Nutch utilises and expects to be 
present when posting data. A recommended course of action would be to use this 
schema in it's own core instance in Solr. In this example, it is assumed you 
have a core named ''nutch ''with this schema.

When configured correctly, there should be a core located at 
''http://localhost:8983/solr/nutch''. You can test this by accessing the 
administration page at ''http://localhost:8983/solr/nutch/admin'' where you can 
also verify that the schema is being correctly loaded.

Reply via email to