Hi, Just to let you know that we have just release the version 0.3 of crawler-commons. Crawler-commons is a set of reusable Java components that implement functionality common to any web crawler. These components benefit from collaboration among various existing web crawler projects, and reduce duplication of effort. The main components are parsers for robots.txt, sitemap files, domain utilities and fetchers.
Crawler-commons is used in Bixo and Apache Nutch for parsing robots.txt files. *Project* -> https://code.google.com/p/crawler-commons/ *Release notes* -> http://crawler-commons.googlecode.com/svn/tags/crawler-commons-0.3/CHANGES.txt *Info about artifacts* -> http://search.maven.org/#artifactdetails|com.google.code.crawler-commons|crawler-commons|0.3|jar Thanks! Julien -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

