Hi guys, I'd just like to mention Crawler Commons which is a effort between the committers of various crawl-related projects (Nutch, Bixo or Heritrix) to put some basic functionalities in common. We currently have mostly a top level domain finder and a sitemap parser, but are definitely planning to have other things there as well, e.g. robots.txt parser, protocol handler etc...
Would you like to get involved? There are quite a few things that the crawler in Manifold could reuse or contribute to. Best, Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com
