RE: Post process Nutch data

Srikanth Shankara Rao Mon, 05 May 2014 21:59:22 -0700

Thanks Julien. This helps. I’ll look into this.

From: Julien Nioche [mailto:[email protected]]
Sent: Monday, May 05, 2014 8:57 PM
To: [email protected]
Subject: Re: Post process Nutch data


Hi

As mentioned earlier in a different discussion on this list behemoth would be 
the right tool for this

Julien

On Monday, 5 May 2014, Srikanth Shankara Rao 
<[email protected]<mailto:[email protected]>> wrote:

Hi All,

I have crawled Nutch data using 1.8. Data is in HDFS. I would like to 
post-process this data before indexing into SOLR. The idea is to transform the 
data based on the content and add few additional fields that describe the 
content.

I would like to do this as part of a hadoop job. What would be the best place 
to add code?

Thanks
Srikanth


--
[http://digitalpebble.com/img/logo.gif]
Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

RE: Post process Nutch data

Reply via email to