Hi As mentioned earlier in a different discussion on this list behemoth would be the right tool for this
Julien On Monday, 5 May 2014, Srikanth Shankara Rao <[email protected]> wrote: > > Hi All, > > I have crawled Nutch data using 1.8. Data is in HDFS. I would like to > post-process this data before indexing into SOLR. The idea is to transform > the data based on the content and add few additional fields that describe > the content. > > I would like to do this as part of a hadoop job. What would be the best > place to add code? > > Thanks > Srikanth > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

