Hi Pranshu, I assume you're talking about CS-572 <http://sunset.usc.edu/classes/cs572_2015/> class assignment at USC.
I think the stats provided by bin/nutch for the crawldb are sufficient (Dr. Mattmann correct me if I'm wrong, please). However, you need to write a script/program to extract the MIME types you encountered. You can do this natively with Java or if you prefer Python ~ like me, you can use nutchpy <https://github.com/ContinuumIO/nutchpy>. Best regards, Mohammad Al-Mohsin On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar <[email protected]> wrote: > > I just wanted to know how can we get the crawl statistics ? Is it just > using the command line options of nutch or do we need to write a script to > generate the stats using nutchpy ? > > >

