Thanks Julien. This helps. I’ll look into this. From: Julien Nioche [mailto:[email protected]] Sent: Monday, May 05, 2014 8:57 PM To: [email protected] Subject: Re: Post process Nutch data
Hi As mentioned earlier in a different discussion on this list behemoth would be the right tool for this Julien On Monday, 5 May 2014, Srikanth Shankara Rao <[email protected]<mailto:[email protected]>> wrote: Hi All, I have crawled Nutch data using 1.8. Data is in HDFS. I would like to post-process this data before indexing into SOLR. The idea is to transform the data based on the content and add few additional fields that describe the content. I would like to do this as part of a hadoop job. What would be the best place to add code? Thanks Srikanth -- [http://digitalpebble.com/img/logo.gif] Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

