I've finished porting the changes from 1.2 which were missing in 1.3 and were not related to the Lucene indexing or search
- NUTCH-878 ScoringFilters should not override the injected score - NUTCH-901 Make index-more plug-in configurable (Markus Jelsma via mattmann) - NUTCH-905 Configurable file protocol parent directory crawling (Thorsten Scherler, mattmann, ab) - NUTCH-855 ScoringFilter and IndexingFilter: To allow for the propagation of URL Metatags and their subsequent indexing (Scott Gonyea via mattmann) - NUTCH-716 Make subcollection index filed multivalued (Dmitry Lihachev via jnioche) I've compared the changes from 2.0 with 1.3 and found the following differences (excluding anything specific to 2.0/GORA) - * NUTCH-564 External parser supports encoding attribute (Antony Bowesman, mattmann)* - NUTCH-714 Need a SFTP and SCP Protocol Handler (Sanjoy Ghosh, mattmann) - * NUTCH-825 Publish nutch artifacts to central maven repository (mattmann)* - NUTCH-851 Port logging to slf4j (jnioche) - NUTCH-861 Renamed HTMLParseFilter into ParseFilter - * NUTCH-872 Change the default fetcher.parse to FALSE (ab).* - * NUTCH-876 Remove remaining robots/IP blocking code in lib-http (ab)* - NUTCH-880 REST API for Nutch (ab) - * NUTCH-883 Remove unused parameters from nutch-default.xml (jnioche)* - * NUTCH-884 FetcherJob should run more reduce tasks than default (ab)* - * NUTCH-886 A .gitignore file for Nutch (dogacan)* - * NUTCH-894 Move statistical language identification from indexing to parsing step* - * NUTCH-921 Reduce dependency of Nutch on config files (ab)* - * NUTCH-930 Remove remaining dependencies on Lucene API (ab)* - NUTCH-931 Simple admin API to fetch status and stop the service (ab) - NUTCH-932 Bulk REST API to retrieve crawl results as JSON (ab) I've created a new issue on https://issues.apache.org/jira/browse/NUTCH-951to track this. I'd be in favour of porting only the things that are not new functionalities and put them in bold above. Any thoughts on this? Julien On 4 January 2011 21:44, Julien Nioche <[email protected]>wrote: > +1 from me. I've committed today a bunch of patches which were in 1.2 but > not in 1.3 (just one last one to do) but haven't compared with 2.0 > > Having a release based on 1.3 would be great as it would be a nice > transition towards 2.0 (delegate indexing/search, dependency management with > Ivy, separation between local and remote deployment, removal of redondant > plugins etc...). > > Julien > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > > > On 4 January 2011 20:27, Andrzej Bialecki <[email protected]> wrote: > >> Hi users & devs, >> >> As you probably know, there are currently two active lines of development >> for Nutch: >> >> * Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely redesigned >> storage layer that uses Apache Gora, which in turn can use various storage >> implementations such as HBase, Cassandra, and MySQL. This branch is still >> largely experimental and unstable, but work is progressing, and at the >> current pace I think a release should be possible within the next ~6 months. >> Another important addition on this branch is a REST API that allows using >> Nutch as a black-box crawling service. >> >> * Nutch branch-1.3: this started as a snapshot of Nutch trunk just before >> merging with nutchbase (i.e. switching to Gora as a storage layer). This >> branch is still largely similar to the previous versions of Nutch, and uses >> Hadoop MapFile/SequenceFile and "segments". As compared with release 1.2 it >> does NOT ship with any search infrastructure, because all search >> functionality has been delegated to Solr (via SolrIndexer). This is BTW also >> true about Nutch trunk. >> >> Regarding branch-1.2 (which is a maintenance branch after release 1.2) >> there have been pretty no updates there, if any. Nutch committer resources >> are very limited (when it comes to active committers), so I don't expect any >> maintenance release from this branch to happen... >> >> I think that considering the relatively remote release date for Nutch 2.-0 >> it would make sense to roll out a 1.3 release based on branch-1.3, after >> making sure that all critical patches from trunk have been merged in there. >> >> What do you think? >> >> -- >> Best regards, >> Andrzej Bialecki <>< >> ___. ___ ___ ___ _ _ __________________________________ >> [__ || __|__/|__||\/| Information Retrieval, Semantic Web >> ___|||__|| \| || | Embedded Unix, System Integration >> http://www.sigram.com Contact: info at sigram dot com >> >> > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

