Configurable indexing plugin (index-extra) 
-------------------------------------------

                 Key: NUTCH-1264
                 URL: https://issues.apache.org/jira/browse/NUTCH-1264
             Project: Nutch
          Issue Type: Improvement
          Components: indexer
    Affects Versions: 1.5
            Reporter: Julien Nioche


We currently have several plugins already distributed or proposed which do very 
comparable things : 
- parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and 
index them
- headings [NUTCH-1005] to generate headings fields in parse-metadata and index 
them
- index-extra [NUTCH-422] to index configurable fields 
- urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks and 
index them
- index-static [NUTCH-940] to generate configurable static fields 

All these plugins have in common that they allow to extract information from 
various sources and generate fields from them and are largely redundant. 
Instead this issue proposes to have a single plugin allowing to generate 
configurable fields from : 
- static values
- parse metadata
- content metadata
- crawldb metadata

and let the other plugins focus on the parsing and extraction of the values to 
index. This will make the addition of new fields simpler by relying on a stable 
common plugin instead of multiplying the code in various plugins.

This plugin will replace index-static [NUTCH-940] and index-extra [NUTCH-422] 
and will serve as a basis for further improvements.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to