Re: Nutchpy crawled statistics

Pranshu Kumar Fri, 20 Feb 2015 21:36:31 -0800

Hi Mohsin,

Thanks for the reply. That is exactly what i was asking. Thanks for
clarifying.


we were also using bin/nutch stats command but i just wanted to be sure if
we have to add some more details to the statistics.

And sorry Professor about the out of context mail. Will be more specific
henceforth with the queries.

On Fri, Feb 20, 2015 at 9:24 PM, Mohammad Al-Mohsin <[email protected]> wrote:

> Hi Pranshu,
>
> I assume you're talking about CS-572
> <http://sunset.usc.edu/classes/cs572_2015/> class assignment at USC.
>
> I think the stats provided by bin/nutch for the crawldb are sufficient
> (Dr. Mattmann correct me if I'm wrong, please).
>
> However, you need to write a script/program to extract the MIME types you
> encountered. You can do this natively with Java or if you prefer Python ~
> like me, you can use nutchpy <https://github.com/ContinuumIO/nutchpy>.
>
>
> Best regards,
> Mohammad Al-Mohsin
>
> On Fri, Feb 20, 2015 at 8:45 PM, Pranshu Kumar <[email protected]> wrote:
>
>>
>> I just wanted to know how can we get the crawl statistics ? Is it just
>> using the command line options of nutch or do we need to write a script to
>> generate the stats using nutchpy ?
>>
>>
>>
>


-- 


*Regards,Pranshu Kumar*

*ComputerScience Grad Student*

*University of Southern California*

*E-mail: [email protected] <[email protected]>Tel: +1-323-899-3830*

Re: Nutchpy crawled statistics

Reply via email to