ScoringFilter and IndexingFilter: To allow for the propagation of URL Metatags 
and their subsequent indexing.
-------------------------------------------------------------------------------------------------------------

                 Key: NUTCH-855
                 URL: https://issues.apache.org/jira/browse/NUTCH-855
             Project: Nutch
          Issue Type: New Feature
          Components: generator, indexer
    Affects Versions: 1.1
            Reporter: Scott Gonyea
             Fix For: 1.2


This plugin is designed to enhance the NUTCH-655 patch, by doing two things:
1. Meta Tags that are supplied with your Crawl URLs, during injection, will be 
propagated throughout the outlinks of those Crawl URLs.
2. When you index your URLs, the meta tags that you specified with your URLs 
will be indexed alongside those URLs--and can be directly queried, assuming you 
have done everything else correctly.

The flat-file of URLs you are injecting should, per NUTCH-655, be tab-delimited 
in the form of:
[www.url.com]\t[key1]=[value1]\t[key2]=[value2]...[keyN]=[valueN]
or:
http://slashdot.org/    corp_owner=Geeknet      will_it_blend=indubitably
http://engadget.com/    corp_owner=Weblogs      genre=geeksquad_thriller

To activate this plugin, you must modify two properties in your nutch-sites.xml:
1. plugin.includes
   from: index-(basic|anchor)
   to:   index-(basic|anchor|urlmeta)
2. urlmeta.tags
   Insert a comma-delimited list of metatags. Using the above example:
   <value>corp_owner, will_it_blend, genre</value>
   Note that you do not need to include the tag with every URL. However, you 
must specify each tag if you want it to be propagated and later indexed.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to