[
https://issues.apache.org/jira/browse/NUTCH-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202068#comment-13202068
]
Hudson commented on NUTCH-1264:
-------------------------------
Integrated in Nutch-trunk #1751 (See
[https://builds.apache.org/job/Nutch-trunk/1751/])
NUTCH-1264 Index-metadata
jnioche :
http://svn.apache.org/viewvc/nutch/trunk/viewvc/?view=rev&root=.&revision=1241074
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/nutch-default.xml
* /nutch/trunk/src/plugin/build.xml
* /nutch/trunk/src/plugin/index-metadata
* /nutch/trunk/src/plugin/index-metadata/build.xml
* /nutch/trunk/src/plugin/index-metadata/ivy.xml
* /nutch/trunk/src/plugin/index-metadata/plugin.xml
* /nutch/trunk/src/plugin/index-metadata/src
* /nutch/trunk/src/plugin/index-metadata/src/java
* /nutch/trunk/src/plugin/index-metadata/src/java/org
* /nutch/trunk/src/plugin/index-metadata/src/java/org/apache
* /nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch
* /nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch/indexer
*
/nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch/indexer/metadata
*
/nutch/trunk/src/plugin/index-metadata/src/java/org/apache/nutch/indexer/metadata/MetadataIndexer.java
> Configurable indexing plugin (index-metadata)
> ----------------------------------------------
>
> Key: NUTCH-1264
> URL: https://issues.apache.org/jira/browse/NUTCH-1264
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.5
> Reporter: Julien Nioche
> Fix For: 1.5
>
> Attachments: NUTCH-1264-trunk-v2.patch, NUTCH-1264-trunk.patch
>
>
> We currently have several plugins already distributed or proposed which do
> very comparable things :
> - parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and
> index them
> - headings [NUTCH-1005] to generate headings fields in parse-metadata and
> index them
> - index-extra [NUTCH-422] to index configurable fields
> - urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks
> and index them
> - index-static [NUTCH-940] to generate configurable static fields
> All these plugins have in common that they allow to extract information from
> various sources and generate fields from them and are largely redundant.
> Instead this issue proposes to have a single plugin allowing to generate
> configurable fields from :
> - static values
> - parse metadata
> - content metadata
> - crawldb metadata
> and let the other plugins focus on the parsing and extraction of the values
> to index. This will make the addition of new fields simpler by relying on a
> stable common plugin instead of multiplying the code in various plugins.
> This plugin will replace index-extra [NUTCH-422] and will serve as a basis
> for further improvements.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira