Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "IntranetDocumentSearch" page has been changed by MichaelAlleblas: http://wiki.apache.org/nutch/IntranetDocumentSearch New page: This wiki is to hopefully get others an easier start into indexing and searching local intranet documents typically found in an enterprise file share. These would include Microsoft Office and PDF documents, text files and digital assets. It draws upon various and sparse sources of information found online on the topic and will try to make their suggestions and changes related to later versions of the required software. = Pre-requisites and Assumptions = This tutorial assumes you are using the following software and configurations * Apache Nutch 1.3 * Apache Solr 3.4.0 * Solr server will be addressed at http://localhost:8983/solr Other versions may or may not work and have not been tested by myself. = Configuration = == Apache Solr == Apache Solr configuration will not be covered here in depth. However, there are some things that should be noted when setting up Solr to receive data from a Nutch crawler. There is a ''schema.xml'' file located in the Nutch ''conf ''directory which contains a Solr schema that Nutch utilises and expects to be present when posting data. A recommended course of action would be to use this schema in it's own core instance in Solr. In this example, it is assumed you have a core named ''nutch ''with this schema. When configured correctly, there should be a core located at ''http://localhost:8983/solr/nutch''. You can test this by accessing the administration page at ''http://localhost:8983/solr/nutch/admin'' where you can also verify that the schema is being correctly loaded.

