Hi , I'm crawling few dozen of websites using Nutch. I'd like to pull out the crawled data i.e html content of millions web pages to process.
I found the similar question on: http://www.mail-archive.com/[email protected]/msg01190.html Is any one have experience in using Pig to process Nutch file ? Also where do I find the Nutch file format of Nutch 1.2 ? The current one on wiki page is 0.x version. Thanks. -- K

