Exactly, Mohammad, thank you. Cheers, Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Mohammad Al-Mohsin <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Friday, February 20, 2015 at 9:24 PM To: "[email protected]" <[email protected]> Subject: Re: Nutchpy crawled statistics >Hi Pranshu, > > >I assume you're talking about >CS-572 <http://sunset.usc.edu/classes/cs572_2015/> class assignment at >USC. > > >I think the stats provided by bin/nutch for the crawldb are sufficient >(Dr. Mattmann correct me if I'm wrong, please). > > >However, you need to write a script/program to extract the MIME types you >encountered. You can do this natively with Java or if you prefer Python ~ >like me, you can use >nutchpy <https://github.com/ContinuumIO/nutchpy>. > > > >Best regards, >Mohammad Al-Mohsin > > >On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar ><[email protected]> wrote: > > >I just wanted to know how can we get the crawl statistics ? Is it just >using the command line options of nutch or do we need to write a script >to generate the stats using nutchpy ? > > > > > > > > > > > > >

