My main concerns with the Nutch2Tutorial was that it didn't stand by itself. As a newcomer to nutch I treated the NutchTutorial (for 1.x) with suspicion because I didn't know what is relevant for Nutch 2 and what isn't. And the Nutch2Tutorial tutorial alone is not enough to get you going.
I think this can be addressed by creating a single page or perhaps several pages that together cover everything you need to perform a basic crawl: [*] Configuring the data store [**] HBase [**] Cassandra [**] MySQL [*] General nutch 2 client configuration that are relevant to any store [*] Crawling [**] Crawling step by step (running each step seperatly) [**] Performing a full crawl [***] using the crawl script [***] using the job file On Wed, Jan 22, 2014 at 1:53 PM, Julien Nioche < lists.digitalpeb...@gmail.com> wrote: > Thanks Tejas! > > > On 22 January 2014 11:51, Tejas Patil <tejas.patil...@gmail.com> wrote: > >> Moved the old nutchhadooptutorial page from Nutch wiki "Front page" to >> "Archive and Legacy". >> >> ~tejas >> >> >> On Wed, Jan 22, 2014 at 5:09 PM, Tejas Patil <tejas.patil...@gmail.com>wrote: >> >>> Thanks *Julien* for pointing me to new "NutchHadoopSingleNodeTutorial" >>> wiki page [0]. I would soon remove the old nutchhadooptutorial page >>> from wiki. >>> >>> [0] : http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial >>> >>> *@d_k*, there are already tutorials for running Nutch 2.x. See [1] and >>> [2]. Those are not as extensive as the tutorial for 1.x [3] but carry the >>> steps which are different for 2.x. The rest steps after datastore setup are >>> similar - the only difference being in the command params which can be >>> figured out from the usage and so they were not duplicated in those 2.x >>> tutorials to avoid maintenance overhead. Do you think that the 2.x >>> tutorials are inadequate in some regards ? >>> >>> [1] : http://wiki.apache.org/nutch/Nutch2Tutorial >>> [2] : http://wiki.apache.org/nutch/Nutch2Cassandra >>> [3] : http://wiki.apache.org/nutch/NutchTutorial >>> >>> Thanks, >>> Tejas >>> >>> >>> On Wed, Jan 22, 2014 at 2:47 AM, d_k <mail...@gmail.com> wrote: >>> >>>> Actually what I would like to see is a Nutch 2.x tutorial at the same >>>> level of detail as the http://wiki.apache.org/nutch/NutchHadoopTutorial >>>> What is the process of contributing to that wiki page? >>>> >>>> >>>> On Tue, Jan 21, 2014 at 9:33 PM, Julien Nioche < >>>> lists.digitalpeb...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> The whole thing has been replaced with >>>>> >>>>> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial<http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial>which >>>>> does exactly what you described. +1 to remove the old >>>>> nutchhadooptutorial page >>>>> >>>>> J. >>>>> >>>>> >>>>> On 21 January 2014 17:44, Tejas Patil <tejas.patil...@gmail.com>wrote: >>>>> >>>>>> Hi nutch-dev, >>>>>> >>>>>> I was looking at [0] and realized that with the massive number of >>>>>> Hadoop setup tutorials out there on internet, we need not repeat the same >>>>>> on nutch wiki page and instead assume that user has already done Hadoop >>>>>> setup. For convinience, we could direct users to the Hadoop wiki page >>>>>> which >>>>>> has Hadoop setup details. >>>>>> Plus, I propose following: >>>>>> >>>>>> - Section "Downloading Hadoop and Nutch" : Remove the Hadoop portions >>>>>> and let the Nutch stuff stay. >>>>>> - Section "Setting Up The Deployment Architecture" must be removed. >>>>>> - Section "Deploy Nutch to Single Machine" and "Deploy Nutch to >>>>>> Multiple Machines" can be merged together. >>>>>> - Section "Performing a Nutch Crawl", "Testing the Crawl" and >>>>>> "Performing a Search" must be merged, its contents must be updated. >>>>>> - Section "Rsyncing Code to Slaves" and "Updates" can be completely >>>>>> removed. >>>>>> >>>>>> Any comments ? >>>>>> >>>>>> [0] : http://wiki.apache.org/nutch/NutchHadoopTutorial >>>>>> >>>>>> Thanks, >>>>>> Tejas >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Open Source Solutions for Text Engineering >>>>> >>>>> http://digitalpebble.blogspot.com/ >>>>> http://www.digitalpebble.com >>>>> http://twitter.com/digitalpebble >>>>> >>>> >>>> >>> >> > > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >