Thanks lewis!Lewis John Mcgibbney <[email protected]> schreef:Good Evening,
The Apache Nutch PMC are pleased to announce the immediate release of Apache Nutch v1.8. Apache Nutch is a highly extensible and scalable open source web crawler software project. Stemming from Apache Lucene, the project has diversified and now comprises two codebases, namely: Nutch 1.x: A well matured, production ready crawler. 1.x enables fine grained configuration, relying on Apache Hadoop data structures, which are great for batch processing. Nutch 2.x: An emerging alternative taking direct inspiration from 1.x, but which differs in one key area; storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings. This means we can implement an extremely flexibile model/stack for storing everything (fetch time, status, content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions. We advise all current users and developers of the 1.X series to upgrade to this release. Although this release includes library upgrades to Crawler Commons 0.3 and Apache Tika 1.4, it also provides over 30 bug fixes as well as 18 improvements. Please see the list of changes for a full breakdown, or see the release report. As usual in the 1.X series, this release is made available both as source and binary. Additionally developers can find Maven artifacts within Maven Central. The release is available here. Thank you Lewis (On behalf of the Nutch PMC) -- Lewis

