Sorry about the spam, everyone. I hope my patch didn't suck too much :). On Wed, Jul 14, 2010 at 6:53 PM, Scott Gonyea (JIRA) <[email protected]>wrote:
> > [ > https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Scott Gonyea updated NUTCH-855: > ------------------------------- > > Attachment: nutch-855.txt > > > ScoringFilter and IndexingFilter: To allow for the propagation of URL > Metatags and their subsequent indexing. > > > ------------------------------------------------------------------------------------------------------------- > > > > Key: NUTCH-855 > > URL: https://issues.apache.org/jira/browse/NUTCH-855 > > Project: Nutch > > Issue Type: New Feature > > Components: generator, indexer > > Affects Versions: 1.1 > > Reporter: Scott Gonyea > > Fix For: 1.2 > > > > Attachments: nutch-855.txt > > > > Original Estimate: 168h > > Remaining Estimate: 168h > > > > This plugin is designed to enhance the NUTCH-655 patch, by doing two > things: > > 1. Meta Tags that are supplied with your Crawl URLs, during injection, > will be propagated throughout the outlinks of those Crawl URLs. > > 2. When you index your URLs, the meta tags that you specified with your > URLs will be indexed alongside those URLs--and can be directly queried, > assuming you have done everything else correctly. > > The flat-file of URLs you are injecting should, per NUTCH-655, be > tab-delimited in the form of: > > [www.url.com]\t[key1]=[value1]\t[key2]=[value2]...[keyN]=[valueN] > > or: > > http://slashdot.org/ corp_owner=Geeknet will_it_blend=indubitably > > http://engadget.com/ corp_owner=Weblogs genre=geeksquad_thriller > > To activate this plugin, you must modify two properties in your > nutch-sites.xml: > > 1. plugin.includes > > from: index-(basic|anchor) > > to: index-(basic|anchor|urlmeta) > > 2. urlmeta.tags > > Insert a comma-delimited list of metatags. Using the above example: > > <value>corp_owner, will_it_blend, genre</value> > > Note that you do not need to include the tag with every URL. However, > you must specify each tag if you want it to be propagated and later indexed. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >

