In addtion to Sebastian's mail, 2.x has index-metadata filter if you want to send any field which is in metadata to index, you just write its name on configuration.
I recommend you look at index-metadata Talat 2 Nis 2014 23:30 tarihinde "Sebastian Nagel" <[email protected]> yazdı: > Hi Yann, > > > In Parse type, we don't have "getData()" so we can't add new metadata. > ... > > So what is the new way to add custom field to index ? Maybe i miss > > something ... > > In 2.x data for custom fields can be added to the WebPage's metadata > in ParseFilter via > page.putToMetadata(Utf8 key, ByteBuffer value) > It's then read in IndexingFilter by > page.getFromMetadata(Utf8 key) > > Sebastian > > On 04/02/2014 05:42 PM, Yann Levreau wrote: > > Hello, > > > > Maybe this is the wrong place to post a request so forgive me, but I > really > > need some help (Nutch 2.2.1) : > > > > I need to add a new field to be indexed by ElasticSearch. > > > > in 1.7, we had : > > The HtmlParseFilter extension with : > > ParseResult< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/ParseResult.html > > > > *filter > > < > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/HtmlParseFilter.html#filter%28org.apache.nutch.protocol.Content,%20org.apache.nutch.parse.ParseResult,%20org.apache.nutch.parse.HTMLMetaTags,%20org.w3c.dom.DocumentFragment%29 > >* > > (Content< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/protocol/Content.html > > > > content, > > ParseResult< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/ParseResult.html > > > > parseResult, > > HTMLMetaTags< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/HTMLMetaTags.html > > > > metaTags, > > DocumentFragment< > http://java.sun.com/javase/6/docs/api/org/w3c/dom/DocumentFragment.html?is-external=true > > > > doc) > > > > The IndexingFilter extension with : > > NutchDocument< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/NutchDocument.html > > > > *filter > > < > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/IndexingFilter.html#filter%28org.apache.nutch.indexer.NutchDocument,%20org.apache.nutch.parse.Parse,%20org.apache.hadoop.io.Text,%20org.apache.nutch.crawl.CrawlDatum,%20org.apache.nutch.crawl.Inlinks%29 > >* > > (NutchDocument< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/NutchDocument.html > > > > doc, > > Parse< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/Parse.html> > > parse, > > org.apache.hadoop.io.Text url, > > CrawlDatum< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/crawl/CrawlDatum.html > > > > datum, > > Inlinks< > http://nutch.apache.org/apidocs-1.7/org/apache/nutch/crawl/Inlinks.html> > > inlinks) > > > > All was ok to add field. > > > > in 2.2.1 we have : > > The ParseFilter extension : > > Parse< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/Parse.html> > > *filter > > < > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html#filter%28java.lang.String,%20org.apache.nutch.storage.WebPage,%20org.apache.nutch.parse.Parse,%20org.apache.nutch.parse.HTMLMetaTags,%20org.w3c.dom.DocumentFragment%29 > >* > > (String< > http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true > > > > url, > > WebPage< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/storage/WebPage.html> > > page, > > Parse< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/Parse.html> > > parse, > > HTMLMetaTags< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/HTMLMetaTags.html > > > > metaTags, > > DocumentFragment< > http://java.sun.com/javase/6/docs/api/org/w3c/dom/DocumentFragment.html?is-external=true > > > > doc) > > In Parse type, we don't have "getData()" so we can't add new metadata. > > > > The IndexingFilter extension : > > NutchDocument< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/NutchDocument.html > > > > *filter > > < > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/IndexingFilter.html#filter%28org.apache.nutch.indexer.NutchDocument,%20java.lang.String,%20org.apache.nutch.storage.WebPage%29 > >* > > (NutchDocument< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/NutchDocument.html > > > > doc, > > String< > http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true > > > > url, > > WebPage< > http://nutch.apache.org/apidocs-2.2/org/apache/nutch/storage/WebPage.html> > > page) > > We don't have Parse type in parameter to add field to NutchDocument type. > > > > So what is the new way to add custom field to index ? Maybe i miss > > something ... > > Thank you very much ! > > > >

