Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "IndexWriters" page has been changed by RoannelFernandez:
https://wiki.apache.org/nutch/IndexWriters?action=diff&rev1=6&rev2=7

Comment:
Description for each section

- = Index writers configuration =
- 
  <<TableOfContents(4)>>
  
+ = Index writers in Nutch =
+ 
+ An index writer is a component of the indexing job, which is used for sending 
documents from one or more segments to an external server. In Nutch, these 
components are found as plugins. Nutch includes these out-of-the-box indexers:
+ 
+ ||'''Indexer''' ||'''Description''' ||
+ ||indexer-solr ||Indexer for a Solr server ||
+ ||indexer-rabbit ||Indexer for a RabbitMQ server ||
+ ||indexer-dummy ||Indexer usually used for debugging, it writes in a plain 
text file ||
+ ||indexer-elastic ||Indexer for an Elasticsearch server ||
+ ||indexer-elastic-rest ||Indexer for Elasticsearch, but using 
[[https://github.com/searchbox-io/Jest|Jest]] to connect with the REST API 
provided by Elasticsearch ||
+ ||indexer-cloudsearch ||Indexer for Amazon <<GetText(CloudSearch)>> ||
+ 
- == Structure of index-writers.xml ==
+ = Structure of index-writers.xml =
+ 
+ The configuration for the indexers is in the index-writers.xml file, included 
in the official Nutch distribution. The structure of this file is quite simple 
and consists mainly of a list of indexers (`<writer>` element):
+ 
+ {{{#!highlight xml
+ <writers>
+   <writer id="<writer_id>" class="<implementation_class>">
+     <mapping>
+       ...
+     </mapping>
+     <parameters>
+       ...
+     </parameters>   
+   </writer>
+   ...
+ </writers>
+ }}}
+ 
+ Each `<writer>` element has two mandatory attributes:
+ 
+  1. `<writer_id>` is a unique identification for each configuration. This 
feature allows Nutch to distinguish each configuration, even when they are for 
the same index writer. In addition, it allows to have multiple instances for 
the same index writer, but with different configurations.
+  1. `<implementation_class>` corresponds to the canonical name of the class 
that implements the 
[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexer/IndexWriter.html|IndexWriter]]
 extension point. For the indexers provided by Nutch out-of-the-box the 
possible values of `<implementation_class>` are:
+ 
+ ||'''Indexer''' ||'''Implementation class''' ||
+ ||indexer-solr 
||[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexwriter/solr/SolrIndexWriter.html|org.apache.nutch.indexwriter.solr.SolrIndexWriter]]
 ||
+ ||indexer-rabbit 
||[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.html|org.apache.nutch.indexwriter.rabbit.RabbitIndexWriter]]
 ||
+ ||indexer-dummy 
||[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexwriter/dummy/DummyIndexWriter.html|org.apache.nutch.indexwriter.dummy.DummyIndexWriter]]
 ||
+ ||indexer-elastic 
||[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.html|org.apache.nutch.indexwriter.elastic.ElasticIndexWriter]]
 ||
+ ||indexer-elastic-rest 
||[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.html|org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter]]
 ||
+ ||indexer-cloudsearch 
||[[https://nutch.apache.org/apidocs/apidocs-1.14/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.html|org.apache.nutch.indexwriter.cloudsearch.CloudSearchIndexWriter]]
 ||
+ 
+ Each `<writer>` element contains two child elements: `<mapping>` and 
`<parameters>`
  
  == Mapping section ==
  
+ The `<mapping>` element is independent for each configuration and is where 
you configure the modifications that will be applied to each document before it 
is sent to its final destination. The `<mapping>` element contains 3 child 
elements: `<copy>`, `<rename>` and `<remove>`
+ 
+  * `<copy>` indicates which fields should be copied from the document and to 
which field they should be copied. Each child element of `<copy>` element, has 
this form: `<field source="<source>" dest="<destination>"/>` 
+    * `<source>` indicates the field's name to be copied.
+    * `<destination>` indicates to which field or fields should be copied. The 
value of this attribute can be a comma separated list. In this case, the value 
of '''source''' attribute will be copied into each field in the list. For 
example: if the configuration is `<field source="title" 
dest="description,search"/>`, the value of the '''title''' field will be copied 
for the '''description''' and '''search''' fields.
+  * `<rename>` indicates which fields of the document should be renamed. Each 
child element of `<rename>` element, has this form: `<field source="<source>" 
dest="<destination>"/>`
+    * `<source>` indicates the field's name to be renamed.
+    * `<destination>` indicates the new name of the field. For example: if the 
configuration is `<field source="metatag.description" dest="description"/>`, 
the field '''metatag.description''' will be renamed as '''description'''.
+  * `<remove>` indicates which fields of the document should be removed. Each 
child element of `<remove>` element, has the form: `<field source="<source>"/>`
+    * `<source>` indicates the field's name to be remove.
+ 
  == Parameters section ==
+ 
+ The `<parameters>` element is independent for each configuration and is where 
the parameters that the indexer needs are specified. Each parameter has the 
form `<param name="<name> "value="<value>"/>` and the values it can take depend 
on the indexer that you want to configure. Below is a description of the 
arguments of each indexer provided by Nutch out-of-the-box individually.
  
  === Solr indexer properties ===
  

Reply via email to