Hi ,

I'm crawling few dozen of websites using Nutch. I'd like to pull out the
crawled data i.e html content of millions web pages to process.

I found the similar question on:
http://www.mail-archive.com/[email protected]/msg01190.html

Is any one have experience in using Pig to process Nutch file ?

Also where do I find the Nutch file format of Nutch 1.2 ? The current one on
wiki page is 0.x version.

Thanks.

-- K

Reply via email to