Hi Guys, Just an update on this.
Please take a look at CHANGES to the new branch I created [0] I'm waiting on Sebastian's comments as currently the zip and tar-src's produce the desired output however the tar and zip-bin targets do not. If this is not a blocker then I can release the artifacts for a VOTE but I wanted to hear from you guys before I do so. Best Lewis [0] http://svn.apache.org/repos/asf/nutch/branches/branch-1.5.1/CHANGES.txt On Thu, Jun 28, 2012 at 6:42 PM, Lewis John Mcgibbney <[email protected]> wrote: > OK this will be done ASAP. > > Thanks for the comments and the time. > > Lewis > > On Thu, Jun 28, 2012 at 8:32 AM, Markus Jelsma > <[email protected]> wrote: >> Hello, >> >> I'd opt for these additional patches >> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) >> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via markus) >> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) >> >> -----Original message----- >>> From:Lewis John Mcgibbney <[email protected]> >>> Sent: Wed 27-Jun-2012 20:33 >>> To: [email protected] >>> Subject: Re: [VOTE] Apache Nutch 1.5.1 Release Candidate >>> >>> Hi, >>> >>> >>> On Wed, Jun 27, 2012 at 2:11 PM, Markus Jelsma >>> <[email protected]> wrote: >>> > Hello, >>> > >>> > I would prefer a minimal bugfix release. The stuff that i committed to >>> > trunk may still have some quirks that i haven't found yet, the >>> > HostURLNormalizer thing Sebastian noted was just one of them. >>> > >>> >>> OK so based on the 1.5.1RC#1 CHANGES.txt [0] we currently have the >>> following commits... >>> >>> * NUTCH-1400 Remove developer -core option for bin/nutch (jnioche) >>> >>> * NUTCH-1404 Nutch script fails to find job file in deploy mode >>> (sidabatra, jnioche) >>> >>> * NUTCH-1398 Upgrade to Hadoop 1.0.3 (jnioche) >>> >>> * NUTCH-1300 Indexer to filter normalize URL's (markus) >>> >>> * NUTCH-1330 WebGraph OutlinkDB to preserve back up (markus) >>> >>> * NUTCH-1319 HostNormalizer plugin (markus) >>> >>> * NUTCH-1386 Headings filter not to add empty values (markus) >>> >>> * NUTCH-1356 ParseUtil use ExecutorService instead of manually thread >>> handling (ferdy via markus) >>> >>> * NUTCH-1352 Improve regex urlfilters/normalizers synchronization >>> (ferdy via markus) >>> >>> * NUTCH-1024 Dynamically set fetchInterval by MIME-type (markus) >>> >>> * NUTCH-1364 Add a counter in Generator for malformed urls (lewismc) >>> >>> * NUTCH-1360 Suport the storing of IP address connected to when web >>> crawling (lewismc) >>> >>> * NUTCH-1262 Map `duplicating` content-types to a single type (markus) >>> >>> * NUTCH-1384 Typo in ParseSegments's run-method (Matthias Agethle via >>> markus) >>> >>> * NUTCH-1385 More robust plug-in order properties in nutch-site.xml >>> (Andy Xue via markus) >>> >>> * NUTCH-1336 Optionally not index db_notmodified pages (markus) >>> >>> * NUTCH-1346 Follow outlinks to ignore external (markus) >>> >>> * NUTCH-1320 IndexChecker and ParseChecker choke on IDN's (markus) >>> >>> * NUTCH-1351 DomainStatistics to aggregate by TLD (markus) >>> >>> * NUTCH-1381 Allow to override default subcollection field name (markus) >>> >>> * NUTCH-XX Commit to add configuration for separation of ant >>> distribution targets (lewismc + jnioche) >>> >>> Do we just wish to include >>> >>> * NUTCH-1404 Nutch script fails to find job file in deploy mode >>> (sidabatra, jnioche) ??? >>> >>> I can run this tomorrow. Thanks >>> >>> [0] http://people.apache.org/~lewismc/apache-nutch-1.5.1-rc1/CHANGES.txt >>> > > > > -- > Lewis -- Lewis

