Configurable indexing plugin (index-extra)
-------------------------------------------
Key: NUTCH-1264
URL: https://issues.apache.org/jira/browse/NUTCH-1264
Project: Nutch
Issue Type: Improvement
Components: indexer
Affects Versions: 1.5
Reporter: Julien Nioche
We currently have several plugins already distributed or proposed which do very
comparable things :
- parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and
index them
- headings [NUTCH-1005] to generate headings fields in parse-metadata and index
them
- index-extra [NUTCH-422] to index configurable fields
- urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks and
index them
- index-static [NUTCH-940] to generate configurable static fields
All these plugins have in common that they allow to extract information from
various sources and generate fields from them and are largely redundant.
Instead this issue proposes to have a single plugin allowing to generate
configurable fields from :
- static values
- parse metadata
- content metadata
- crawldb metadata
and let the other plugins focus on the parsing and extraction of the values to
index. This will make the addition of new fields simpler by relying on a stable
common plugin instead of multiplying the code in various plugins.
This plugin will replace index-static [NUTCH-940] and index-extra [NUTCH-422]
and will serve as a basis for further improvements.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira