Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "WritingPluginExample" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/WritingPluginExample?action=diff&rev1=17&rev2=18 + = Writing a Nutch Plugin = + + == Introduction == + This plugin example focuses on the urlmeta plugin which which is packaged with Nutch-1.3. It aims to provide a comprehensive introduction to plugin development for Apache Nutch. + + <<TableOfContents(3)>> == The Example == Consider this as a plugin example: We want to be able to recommend specific web pages for given search terms. For this example we'll assume we're crawling this site with Nutch and indexing it with Apache Solr. As you may have noticed, there are a number of pages that talk about plugins. If someone searches for the term "plugin", we want the first hit returned to be the Nutch PluginCentral page, however we also want to return all the normal hits in the expected ranking. @@ -9, +15 @@ 1. Meta Tags that are supplied with your Crawl URLs, during injection, will be propagated throughout the out-links of those Crawl URLs 2. When you index your URLs, the meta tags that you specified with your URLs will be indexed alongside those URLs--and can be directly queried, assuming you have done everything else correctly. - In order to do this we go through our site and add meta-tags to pages that list what terms they should be recommended for. The tags look something like this: + The first step here is to go through our site and add meta-tags to pages that list what terms they should be recommended for. The tags look something like this: {{{ <meta name="recommended" content="plugins" /> }}} - In order to do this we need to write a plugin that extends 2 different extension points. Firstly we need to extend the [[http://nutch.apache.org/apidocs-1.3/org/apache/nutch/indexer/IndexingFilter.html|IndexingFilter]] by creating an URLMetaIndexingFilter as we need to add any additional meta-tags to the index. Secondly we need to extend the [[http://nutch.apache.org/apidocs-1.3/org/apache/nutch/scoring/ScoringFilter.html|ScoringFilter]] by creating an URLMetaScoringFilter. The idea here is that this will take the metatags we have listed in our "urlmeta.tags" property, and looks for them inside the parseData object. This allows us to match recommended terms out of the meta tags. If there was no property within nutch-default.xml for us to specify these terms we would be required to add the new field to our Nutch schema.xml which would also add the ability to search against the new field in the index. + In order to do this we need to write a plugin that extends 2 different extension points. Firstly we need to extend the [[http://nutch.apache.org/apidocs-1.3/org/apache/nutch/indexer/IndexingFilter.html|IndexingFilter]] by creating an URLMetaIndexingFilter as we need to add any additional meta-tags to the index. Secondly we need to extend the [[http://nutch.apache.org/apidocs-1.3/org/apache/nutch/scoring/ScoringFilter.html|ScoringFilter]] by creating an URLMetaScoringFilter. The idea here is that this will take the metatags we have listed in our "urlmeta.tags" property, and look for them inside the !parseData object. This allows us to match recommended terms out of the meta tags. If there was no property within nutch-default.xml for us to specify these terms we would be required to add the new field to our Nutch schema.xml which would similarly add the ability to search against the new field in the index. == Setup == Start by [[http://svn.apache.org/repos/asf/nutch/tags/release-1.3/|downloading]] the Nutch-1.3 source code. Once you've got that make sure it compiles as is before you decide to make any changes. You should be able to get it to compile by running ant from the directory you downloaded the source to. If you have trouble you can write to one of the [[Mailing|Mailing Lists]].

