+1 On 28 June 2012 08:32, Markus Jelsma <[email protected]> wrote:
> Hello, > > I'd opt for these additional patches > * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) > * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via > markus) > * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) > > -----Original message----- > > From:Lewis John Mcgibbney <[email protected]> > > Sent: Wed 27-Jun-2012 20:33 > > To: [email protected] > > Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate > > > > Hi, > > > > > > On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma > > <[email protected]> wrote: > > > Hello, > > > > > > I would prefer a minimal bugfix release. The stuff that i committed to > trunk may still have some quirks that i haven't found yet, the > HostURLNormalizer thing Sebastian noted was just one of them. > > > > > > > OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the > > following commits... > > > > * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) > > > > * NUTCH-1404 Nutch script fails to find job file in deploy mode > > (sidabatra, jnioche) > > > > * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) > > > > * NUTCH-1300 Indexer to filter normalize URL's (markus) > > > > * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) > > > > * NUTCH-1319 HostNormalizer plugin (markus) > > > > * NUTCH-1386 Headings filter not to add empty values (markus) > > > > * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread > > handling (ferdy via markus) > > > > * NUTCH-1352 Improve regex urlfilters/normalizers synchronization > > (ferdy via markus) > > > > * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) > > > > * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) > > > > * NUTCH-1360 Suport the storing of IP address connected to when web > > crawling (lewismc) > > > > * NUTCH-1262 Map `duplicating` content-types to a single type (markus) > > > > * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via > markus) > > > > * NUTCH-1385 More robust plug-in order properties in nutch-site.xml > > (Andy Xue via markus) > > > > * NUTCH-1336 Optionally not index db_notmodified pages (markus) > > > > * NUTCH-1346 Follow outlinks to ignore external (markus) > > > > * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) > > > > * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) > > > > * NUTCH-1381 Allow to override default subcollection field name (markus) > > > > * NUTCH-XX Commit to add configuration for separation of ant > > distribution targets (lewismc + jnioche) > > > > Do we just wish to include > > > > * NUTCH-1404 Nutch script fails to find job file in deploy mode > > (sidabatra, jnioche) ??? > > > > I can run this tomorrow. Thanks > > > > [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

