[
https://issues.apache.org/jira/browse/NUTCH-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201392#comment-13201392
]
Markus Jelsma commented on NUTCH-1264:
--------------------------------------
Works fine!
> Configurable indexing plugin (index-metadata)
> ----------------------------------------------
>
> Key: NUTCH-1264
> URL: https://issues.apache.org/jira/browse/NUTCH-1264
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.5
> Reporter: Julien Nioche
> Attachments: NUTCH-1264-trunk-v2.patch, NUTCH-1264-trunk.patch
>
>
> We currently have several plugins already distributed or proposed which do
> very comparable things :
> - parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and
> index them
> - headings [NUTCH-1005] to generate headings fields in parse-metadata and
> index them
> - index-extra [NUTCH-422] to index configurable fields
> - urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks
> and index them
> - index-static [NUTCH-940] to generate configurable static fields
> All these plugins have in common that they allow to extract information from
> various sources and generate fields from them and are largely redundant.
> Instead this issue proposes to have a single plugin allowing to generate
> configurable fields from :
> - static values
> - parse metadata
> - content metadata
> - crawldb metadata
> and let the other plugins focus on the parsing and extraction of the values
> to index. This will make the addition of new fields simpler by relying on a
> stable common plugin instead of multiplying the code in various plugins.
> This plugin will replace index-extra [NUTCH-422] and will serve as a basis
> for further improvements.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira