CrawlerCommons & ManifoldCF

Julien Nioche Thu, 02 Jun 2011 08:11:50 -0700

Hi guys,

I'd just like to mention Crawler Commons which is a effort between the
committers of various crawl-related projects (Nutch, Bixo or Heritrix) to
put some basic functionalities in common. We currently have mostly a top
level domain finder and a sitemap parser, but are definitely planning to
have other things there as well, e.g. robots.txt parser, protocol handler
etc...


Would you like to get involved? There are quite a few things that the
crawler in Manifold could reuse or contribute to.

Best,

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

CrawlerCommons & ManifoldCF

Reply via email to