Hi Pranshu,

I assume you're talking about CS-572
<http://sunset.usc.edu/classes/cs572_2015/> class assignment at USC.

I think the stats provided by bin/nutch for the crawldb are sufficient (Dr.
Mattmann correct me if I'm wrong, please).

However, you need to write a script/program to extract the MIME types you
encountered. You can do this natively with Java or if you prefer Python ~
like me, you can use nutchpy <https://github.com/ContinuumIO/nutchpy>.


Best regards,
Mohammad Al-Mohsin

On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar <[email protected]> wrote:

>
> I just wanted to know how can we get the crawl statistics ? Is it just
> using the command line options of nutch or do we need to write a script to
> generate the stats using nutchpy ?
>
>
>

Reply via email to