N.B. Previous message doesn't seem to have been mod'd through under my @ apache.org address so resending ;) It has however been distributed to [email protected] already
Hi All, The Apache Nutch PMC are extremely pleased to announce the immediate release of Apache Nutch v1.7. Apache Nutch is an open source web-search software project. Stemming from Apache Lucene <http://lucene.apache.org/java/>, it now builds on Apache Solr<http://lucene.apache.org/solr/>adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika <http://tika.apache.org/> for HTML and and array other document formats. This release includes over 20 bug fixes, as many improvements; most noticeably featuring a new pluggable indexing architecture<https://issues.apache.org/jira/browse/NUTCH-1047>which currently supports Apache Solr <http://lucene.apache.org/solr> and Elastic Search<http://www.elasticsearch.org/>. Shadowing the recent Nutch 2.2 release, parsing of Robots.txt is now delegated to Crawler-Commons <http://code.google.com/p/crawler-commons/>. Key library upgrades have been made to Apache Hadoop<http://hadoop.apache.org>1.2.0 and Apache Tika <http://tika.apache.org> 1.3. Please see the list of changes<http://www.apache.org/dist/nutch/1.7/1.7-CHANGES.txt>or the release report <http://s.apache.org/1zE> made in this version for a full breakdown. As usual in the 1.x series, the release is made available as binary and source (zip + tar.gz) and is also available within Maven Central<http://search.maven.org/>. The release is available here <http://www.apache.org/dyn/closer.cgi/nutch/>. Happy crawling lewismc (on behalf of the Apache Nutch PMC) -- *Lewis*

