Using Yahoo Pig to load Nutch files

Khang Ich Wed, 01 Dec 2010 22:12:30 -0800

Hi ,

I'm crawling few dozen of websites using Nutch. I'd like to pull out the
crawled data i.e html content of millions web pages to process.


I found the similar question on:
http://www.mail-archive.com/[email protected]/msg01190.html

Is any one have experience in using Pig to process Nutch file ?

Also where do I find the Nutch file format of Nutch 1.2 ? The current one on
wiki page is 0.x version.

Thanks.

-- K

Using Yahoo Pig to load Nutch files

Reply via email to