Hi All, I have crawled Nutch data using 1.8. Data is in HDFS. I would like to post-process this data before indexing into SOLR. The idea is to transform the data based on the content and add few additional fields that describe the content.
I would like to do this as part of a hadoop job. What would be the best place to add code? Thanks Srikanth

