Aww you removed my sarcasm. Also, I think you committed bits with references to "index-urlmeta". That might have been my bad for leaving it in.
I changed it to just "urlmeta" as it's both an indexing and a scoring filter. I think the comments need to be adjusted to reflect that, else I may be the target of a hit-and-run. Sent from my iPhone On Jul 25, 2010, at 10:51 AM, "Chris A. Mattmann (JIRA)" <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Chris A. Mattmann resolved NUTCH-855. > ------------------------------------- > > Fix Version/s: (was: 2.0) > Resolution: Fixed > > - Applied to 1.2-branch in r979079. Cleaned up comments, removed author tags > (Nutch decided a long time ago that the project would move away from author > tags), cleaned up formatting. Patch doesn't apply to trunk or Nutchbase > branch because LuceneWriter doesn't exist anymore for Nutch 2.0. If someone > wants to port this to Nutchbase-ville, by all means, but if so, please open a > new issue for it. Thanks very much, Scott! > >> ScoringFilter and IndexingFilter: To allow for the propagation of URL >> Metatags and their subsequent indexing. >> ------------------------------------------------------------------------------------------------------------- >> >> Key: NUTCH-855 >> URL: https://issues.apache.org/jira/browse/NUTCH-855 >> Project: Nutch >> Issue Type: New Feature >> Components: generator, indexer >> Affects Versions: 1.1 >> Reporter: Scott Gonyea >> Assignee: Chris A. Mattmann >> Fix For: 1.2 >> >> Attachments: nutch-855.txt >> >> Original Estimate: 168h >> Remaining Estimate: 168h >> >> This plugin is designed to enhance the NUTCH-655 patch, by doing two things: >> 1. Meta Tags that are supplied with your Crawl URLs, during injection, will >> be propagated throughout the outlinks of those Crawl URLs. >> 2. When you index your URLs, the meta tags that you specified with your URLs >> will be indexed alongside those URLs--and can be directly queried, assuming >> you have done everything else correctly. >> The flat-file of URLs you are injecting should, per NUTCH-655, be >> tab-delimited in the form of: >> www.url.com\tkey1=value1\tkey2=value2\t...\tkeyN=valueN >> or: >> http://slashdot.org/ corp_owner=Geeknet will_it_blend=indubitably >> http://engadget.com/ corp_owner=Weblogs genre=geeksquad_thriller >> To activate this plugin, you must modify two properties in your >> nutch-sites.xml: >> 1. plugin.includes >> add: urlmeta >> to: <value>...</value> >> ie: <value>urlmeta|parse-tika|scoring-opic|...</value> >> 2. urlmeta.tags >> Insert a comma-delimited list of metatags. Using the above example: >> <value>corp_owner, will_it_blend, genre</value> >> Note that you do not need to include the tag with every URL. However, you >> must specify each tag if you want it to be propagated and later indexed. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. >

