Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "IndexWriters" page has been changed by RoannelFernandez: https://wiki.apache.org/nutch/IndexWriters?action=diff&rev1=12&rev2=13 Comment: Resolving some feedback * `<destination>` indicates the new name of the field. For example: if the configuration is `<field source="metatag.description" dest="description"/>`, the field '''metatag.description''' will be renamed as '''description'''. * `<remove>` indicates which fields of the document should be removed. Each child element of `<remove>` element, has the form: `<field source="<source>"/>` * `<source>` indicates the field's name to be remove. + + {{{{#!wiki caution + '''Mapping section can't be empty''' + + If you don't want to modify the document, just leave `<copy>`, `<rename>` and `<remove>` empty, like: `<mapping> <copy /> <rename /> <remove /> </mapping>` + + }}}} + + === Use case === + + We have two servers previously configured (Solr and RabbitMQ). We want to send documents to each one, but with a different structure. Prior to the index step, each document has this hypothetical structure: + + {{{#!highlight properties + host: "www.example.org" + domain: "example.org" + title: "Example page" + metatag.description: "Example page description" + metatag.keywords: ["example", "page"] + segment: 20180621163128 + }}} + With this configuration we modify the structure of each document in different ways, depending the index writer: + + {{{#!highlight xml + <writer id="indexer_solr_1" class="org.apache.nutch.indexwriter.solr.SolrIndexWriter"> + <parameters> + <!-- Parameters here --> + </parameters> + <mapping> + <copy/> + <rename> + <field source="metatag.description" dest="description"/> + <field source="metatag.keywords" dest="keywords"/> + </rename> + <remove> + <field source="segment"/> + </remove> + </mapping> + </writer> + <writer id="indexer_rabbit_1" class="org.apache.nutch.indexwriter.rabbit.RabbitIndexWriter"> + <parameters> + <!-- Parameters here --> + </parameters> + <mapping> + <copy> + <field source="title" dest="search"/> + </copy> + <rename> + <field source="metatag.description" dest="description"/> + <field source="metatag.keywords" dest="keywords"/> + <field source="domain" dest="domain_name"/> + </rename> + <remove /> + </mapping> + </writer> + }}} + + For `indexer-solr` we'll get documents like: + + {{{#!highlight properties + host: "www.example.org" + domain: "example.org" + title: "Example page" + description: "Example page description" + keywords: ["example", "page"] + }}} + + For `indexer-rabbit` the document's structure is like: + + {{{#!highlight properties + host: "www.example.org" + domain_name: "example.org" + title: "Example page" + search: "Example page" + description: "Example page description" + keywords: ["example", "page"] + segment: 20180621163128 + }}} == Parameters section ==

