Aww you removed my sarcasm. Also, I think you committed bits with references to 
"index-urlmeta". That might have been my bad for leaving it in.

I changed it to just "urlmeta" as it's both an indexing and a scoring filter. I 
think the comments need to be adjusted to reflect that, else I may be the 
target of a hit-and-run.

Sent from my iPhone

On Jul 25, 2010, at 10:51 AM, "Chris A. Mattmann (JIRA)" <[email protected]> 
wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>  ]
> 
> Chris A. Mattmann resolved NUTCH-855.
> -------------------------------------
> 
>    Fix Version/s:     (was: 2.0)
>       Resolution: Fixed
> 
> - Applied to 1.2-branch in r979079. Cleaned up comments, removed author tags 
> (Nutch decided a long time ago that the project would move away from author 
> tags), cleaned up formatting. Patch doesn't apply to trunk or Nutchbase 
> branch because LuceneWriter doesn't exist anymore for Nutch 2.0. If someone 
> wants to port this to Nutchbase-ville, by all means, but if so, please open a 
> new issue for it. Thanks very much, Scott!
> 
>> ScoringFilter and IndexingFilter: To allow for the propagation of URL 
>> Metatags and their subsequent indexing.
>> -------------------------------------------------------------------------------------------------------------
>> 
>>                Key: NUTCH-855
>>                URL: https://issues.apache.org/jira/browse/NUTCH-855
>>            Project: Nutch
>>         Issue Type: New Feature
>>         Components: generator, indexer
>>   Affects Versions: 1.1
>>           Reporter: Scott Gonyea
>>           Assignee: Chris A. Mattmann
>>            Fix For: 1.2
>> 
>>        Attachments: nutch-855.txt
>> 
>>  Original Estimate: 168h
>> Remaining Estimate: 168h
>> 
>> This plugin is designed to enhance the NUTCH-655 patch, by doing two things:
>> 1. Meta Tags that are supplied with your Crawl URLs, during injection, will 
>> be propagated throughout the outlinks of those Crawl URLs.
>> 2. When you index your URLs, the meta tags that you specified with your URLs 
>> will be indexed alongside those URLs--and can be directly queried, assuming 
>> you have done everything else correctly.
>> The flat-file of URLs you are injecting should, per NUTCH-655, be 
>> tab-delimited in the form of:
>> www.url.com\tkey1=value1\tkey2=value2\t...\tkeyN=valueN
>> or:
>> http://slashdot.org/    corp_owner=Geeknet    will_it_blend=indubitably
>> http://engadget.com/    corp_owner=Weblogs    genre=geeksquad_thriller
>> To activate this plugin, you must modify two properties in your 
>> nutch-sites.xml:
>> 1. plugin.includes
>>   add: urlmeta
>>   to:   <value>...</value>
>>   ie: <value>urlmeta|parse-tika|scoring-opic|...</value>
>> 2. urlmeta.tags
>>   Insert a comma-delimited list of metatags. Using the above example:
>>   <value>corp_owner, will_it_blend, genre</value>
>>   Note that you do not need to include the tag with every URL. However, you 
>> must specify each tag if you want it to be propagated and later indexed.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 

Reply via email to