Hello, Maybe this is the wrong place to post a request so forgive me, but I really need some help (Nutch 2.2.1) :
I need to add a new field to be indexed by ElasticSearch. in 1.7, we had : The HtmlParseFilter extension with : ParseResult<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/ParseResult.html> *filter <http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/HtmlParseFilter.html#filter%28org.apache.nutch.protocol.Content,%20org.apache.nutch.parse.ParseResult,%20org.apache.nutch.parse.HTMLMetaTags,%20org.w3c.dom.DocumentFragment%29>* (Content<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/protocol/Content.html> content, ParseResult<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/ParseResult.html> parseResult, HTMLMetaTags<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/HTMLMetaTags.html> metaTags, DocumentFragment<http://java.sun.com/javase/6/docs/api/org/w3c/dom/DocumentFragment.html?is-external=true> doc) The IndexingFilter extension with : NutchDocument<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/NutchDocument.html> *filter <http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/IndexingFilter.html#filter%28org.apache.nutch.indexer.NutchDocument,%20org.apache.nutch.parse.Parse,%20org.apache.hadoop.io.Text,%20org.apache.nutch.crawl.CrawlDatum,%20org.apache.nutch.crawl.Inlinks%29>* (NutchDocument<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/NutchDocument.html> doc, Parse<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/Parse.html> parse, org.apache.hadoop.io.Text url, CrawlDatum<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/crawl/CrawlDatum.html> datum, Inlinks<http://nutch.apache.org/apidocs-1.7/org/apache/nutch/crawl/Inlinks.html> inlinks) All was ok to add field. in 2.2.1 we have : The ParseFilter extension : Parse<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/Parse.html> *filter <http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html#filter%28java.lang.String,%20org.apache.nutch.storage.WebPage,%20org.apache.nutch.parse.Parse,%20org.apache.nutch.parse.HTMLMetaTags,%20org.w3c.dom.DocumentFragment%29>* (String<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true> url, WebPage<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/storage/WebPage.html> page, Parse<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/Parse.html> parse, HTMLMetaTags<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/HTMLMetaTags.html> metaTags, DocumentFragment<http://java.sun.com/javase/6/docs/api/org/w3c/dom/DocumentFragment.html?is-external=true> doc) In Parse type, we don't have "getData()" so we can't add new metadata. The IndexingFilter extension : NutchDocument<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/NutchDocument.html> *filter <http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/IndexingFilter.html#filter%28org.apache.nutch.indexer.NutchDocument,%20java.lang.String,%20org.apache.nutch.storage.WebPage%29>* (NutchDocument<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/NutchDocument.html> doc, String<http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true> url, WebPage<http://nutch.apache.org/apidocs-2.2/org/apache/nutch/storage/WebPage.html> page) We don't have Parse type in parameter to add field to NutchDocument type. So what is the new way to add custom field to index ? Maybe i miss something ... Thank you very much !

