Thanks lewis!Lewis John Mcgibbney <[email protected]> schreef:Good 
Evening,

The Apache Nutch PMC are pleased to announce the immediate release of Apache 
Nutch v1.8. 

Apache Nutch is a highly extensible and scalable open source web crawler 
software project. Stemming from Apache Lucene, the project has diversified and 
now comprises two codebases, namely: Nutch 1.x: A well matured, production 
ready crawler. 1.x enables fine grained configuration, relying on Apache Hadoop 
data structures, which are great for batch processing. Nutch 2.x: An emerging 
alternative taking direct inspiration from 1.x, but which differs in one key 
area; storage is abstracted away from any specific underlying data store by 
using Apache Gora for handling object to persistent mappings. This means we can 
implement an extremely flexibile model/stack for storing everything (fetch 
time, status, content, parsed text, outlinks, inlinks, etc.) into a number of 
NoSQL storage solutions.

We advise all current users and developers of the 1.X series to upgrade to this 
release. Although this release includes library upgrades to Crawler Commons 0.3 
and Apache Tika 1.4, it also provides over 30 bug fixes as well as 18 
improvements. Please see the list of changes for a full breakdown, or see the 
release report. As usual in the 1.X series, this release is made available both 
as source and binary. Additionally developers can find Maven artifacts within 
Maven Central. The release is available here. 

Thank you
Lewis
(On behalf of the Nutch PMC)

-- 
Lewis 

Reply via email to