Re: Nutchpy crawled statistics

Mattmann, Chris A (3980) Sun, 22 Feb 2015 16:53:03 -0800

Exactly, Mohammad, thank you.

Cheers,
Chris



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Mohammad Al-Mohsin <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, February 20, 2015 at 9:24 PM
To: "[email protected]" <[email protected]>
Subject: Re: Nutchpy crawled statistics

>Hi Pranshu,
>
>
>I assume you're talking about
>CS-572 <http://sunset.usc.edu/classes/cs572_2015/> class assignment at
>USC.
>
>
>I think the stats provided by bin/nutch for the crawldb are sufficient
>(Dr. Mattmann correct me if I'm wrong, please).
>
>
>However, you need to write a script/program to extract the MIME types you
>encountered. You can do this natively with Java or if you prefer Python ~
>like me, you can use
>nutchpy <https://github.com/ContinuumIO/nutchpy>.
>
>
>
>Best regards,
>Mohammad Al-Mohsin
>
>
>On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar
><[email protected]> wrote:
>
>
>I just wanted to know how can we get the crawl statistics ? Is it just
>using the command line options of nutch or do we need to write a script
>to generate the stats using nutchpy ?
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Nutchpy crawled statistics

Reply via email to