On 9 December 2013 19:37, Debamitro Chakraborti <[email protected]> wrote:
> Any way to crawl the back issues of prominent Indian newspapers like The
> Hindu, TOI, Indian Express, Hindustan Times etc?
> I was part of a team which needed to analyse news reports from a time frame
> and we hacked together a TOI crawler (which still has limitations) and were
> working on a The Hindu crawler -- would love to know about something simpler
> that is already available.

The newsrack.in recommendation is a good one, but Newsrack
is intended as much more than a simple crawler. If a crawler is
what you need, you should look into something like Nutch
( http://nutch.apache.org/ ). If you prefer to write your own for
simple, non-generic. needs we have happily used Scrapy in
the Python world ( http://scrapy.org/ ).

Regards,
Gora

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to