(apologies for cross posting...) Good Afternoon Everyone,
The 1.5 release of Nutch is now available. This release includes several improvements including upgrades of several major components including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and WebGraph elements as well as a number of new plugins covering blacklisting, filtering and parsing to name a few. Please see the list of changes http://www.apache.org/dist/nutch/CHANGES-1.5.txt made in this version for a full breakdown of the 50 odd improvements the release boasts. A full PMC release statement can be found below http://nutch.apache.org/#07+June+2012+-+Apache+Nutch+1.5+Released Apache Nutch is an open source web-search software project. Stemming from Apache Lucene, it now builds on Apache Solr adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika for HTML and and array other document formats. Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster. The system can be enhanced (eg other document formats can be parsed) using a highly flexible, easily extensible and thoroughly maintained plugin infrastructure. Nutch is available in source and binary form (zip and tar.gz) from the following download page: http://www.apache.org/dyn/closer.cgi/nutch/ In the initial 48 hours, the release may not be available on all mirrors. When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site: http://www.apache.org/dist/nutch/KEYS For more information on Apache Nutch, visit the project home page: http://nutch.apache.org Thank you very much Lewis John McGibbney (on behalf of the Apache Nutch community)

