[ 
https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved NUTCH-855.
-------------------------------------

    Fix Version/s:     (was: 2.0)
       Resolution: Fixed

- Applied to 1.2-branch in r979079. Cleaned up comments, removed author tags 
(Nutch decided a long time ago that the project would move away from author 
tags), cleaned up formatting. Patch doesn't apply to trunk or Nutchbase branch 
because LuceneWriter doesn't exist anymore for Nutch 2.0. If someone wants to 
port this to Nutchbase-ville, by all means, but if so, please open a new issue 
for it. Thanks very much, Scott!

> ScoringFilter and IndexingFilter: To allow for the propagation of URL 
> Metatags and their subsequent indexing.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-855
>                 URL: https://issues.apache.org/jira/browse/NUTCH-855
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator, indexer
>    Affects Versions: 1.1
>            Reporter: Scott Gonyea
>            Assignee: Chris A. Mattmann
>             Fix For: 1.2
>
>         Attachments: nutch-855.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This plugin is designed to enhance the NUTCH-655 patch, by doing two things:
> 1. Meta Tags that are supplied with your Crawl URLs, during injection, will 
> be propagated throughout the outlinks of those Crawl URLs.
> 2. When you index your URLs, the meta tags that you specified with your URLs 
> will be indexed alongside those URLs--and can be directly queried, assuming 
> you have done everything else correctly.
> The flat-file of URLs you are injecting should, per NUTCH-655, be 
> tab-delimited in the form of:
> www.url.com\tkey1=value1\tkey2=value2\t...\tkeyN=valueN
> or:
> http://slashdot.org/  corp_owner=Geeknet      will_it_blend=indubitably
> http://engadget.com/  corp_owner=Weblogs      genre=geeksquad_thriller
> To activate this plugin, you must modify two properties in your 
> nutch-sites.xml:
> 1. plugin.includes
>    add: urlmeta
>    to:   <value>...</value>
>    ie: <value>urlmeta|parse-tika|scoring-opic|...</value>
> 2. urlmeta.tags
>    Insert a comma-delimited list of metatags. Using the above example:
>    <value>corp_owner, will_it_blend, genre</value>
>    Note that you do not need to include the tag with every URL. However, you 
> must specify each tag if you want it to be propagated and later indexed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to