In addtion to Sebastian's mail, 2.x has index-metadata filter if you want
to send any field which is in metadata to index, you just write its name on
configuration.

I recommend you look at index-metadata

Talat
2 Nis 2014 23:30 tarihinde "Sebastian Nagel" <[email protected]>
yazdı:

> Hi Yann,
>
> > In Parse type, we don't have "getData()" so we can't add new metadata.
> ...
> > So what is the new way to add custom field to index ? Maybe i miss
> > something ...
>
> In 2.x data for custom fields can be added to the WebPage's metadata
> in ParseFilter via
>  page.putToMetadata(Utf8 key, ByteBuffer value)
> It's then read in IndexingFilter by
>  page.getFromMetadata(Utf8 key)
>
> Sebastian
>
> On 04/02/2014 05:42 PM, Yann Levreau wrote:
> > Hello,
> >
> > Maybe this is the wrong place to post a request so forgive me, but I
> really
> > need some help (Nutch 2.2.1) :
> >
> > I need to add a new field to be indexed by ElasticSearch.
> >
> > in 1.7, we had :
> > The HtmlParseFilter extension with :
> > ParseResult<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/ParseResult.html
> >
> > *filter
> > <
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/HtmlParseFilter.html#filter%28org.apache.nutch.protocol.Content,%20org.apache.nutch.parse.ParseResult,%20org.apache.nutch.parse.HTMLMetaTags,%20org.w3c.dom.DocumentFragment%29
> >*
> > (Content<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/protocol/Content.html
> >
> > content,
> > ParseResult<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/ParseResult.html
> >
> > parseResult,
> > HTMLMetaTags<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/HTMLMetaTags.html
> >
> > metaTags,
> > DocumentFragment<
> http://java.sun.com/javase/6/docs/api/org/w3c/dom/DocumentFragment.html?is-external=true
> >
> >  doc)
> >
> > The IndexingFilter extension with :
> > NutchDocument<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/NutchDocument.html
> >
> > *filter
> > <
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/IndexingFilter.html#filter%28org.apache.nutch.indexer.NutchDocument,%20org.apache.nutch.parse.Parse,%20org.apache.hadoop.io.Text,%20org.apache.nutch.crawl.CrawlDatum,%20org.apache.nutch.crawl.Inlinks%29
> >*
> > (NutchDocument<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/indexer/NutchDocument.html
> >
> > doc,
> > Parse<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/parse/Parse.html>
> > parse,
> > org.apache.hadoop.io.Text url,
> > CrawlDatum<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/crawl/CrawlDatum.html
> >
> > datum,
> > Inlinks<
> http://nutch.apache.org/apidocs-1.7/org/apache/nutch/crawl/Inlinks.html>
> >  inlinks)
> >
> > All was ok to add field.
> >
> > in 2.2.1 we have :
> > The ParseFilter extension :
> >   Parse<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/Parse.html>
> > *filter
> > <
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/ParseFilter.html#filter%28java.lang.String,%20org.apache.nutch.storage.WebPage,%20org.apache.nutch.parse.Parse,%20org.apache.nutch.parse.HTMLMetaTags,%20org.w3c.dom.DocumentFragment%29
> >*
> > (String<
> http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true
> >
> > url,
> > WebPage<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/storage/WebPage.html>
> > page,
> > Parse<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/Parse.html>
> > parse,
> > HTMLMetaTags<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/parse/HTMLMetaTags.html
> >
> > metaTags,
> > DocumentFragment<
> http://java.sun.com/javase/6/docs/api/org/w3c/dom/DocumentFragment.html?is-external=true
> >
> >  doc)
> > In Parse type, we don't have "getData()" so we can't add new metadata.
> >
> > The IndexingFilter extension :
> > NutchDocument<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/NutchDocument.html
> >
> > *filter
> > <
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/IndexingFilter.html#filter%28org.apache.nutch.indexer.NutchDocument,%20java.lang.String,%20org.apache.nutch.storage.WebPage%29
> >*
> > (NutchDocument<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/indexer/NutchDocument.html
> >
> > doc,
> > String<
> http://java.sun.com/javase/6/docs/api/java/lang/String.html?is-external=true
> >
> > url,
> > WebPage<
> http://nutch.apache.org/apidocs-2.2/org/apache/nutch/storage/WebPage.html>
> >  page)
> > We don't have Parse type in parameter to add field to NutchDocument type.
> >
> > So what is the new way to add custom field to index ? Maybe i miss
> > something ...
> > Thank you very much !
> >
>
>

Reply via email to