My main concerns with the Nutch2Tutorial was that it didn't stand by
itself. As a newcomer to nutch I treated the NutchTutorial (for 1.x) with
suspicion because I didn't know what is relevant for Nutch 2 and what isn't.
And the Nutch2Tutorial tutorial alone is not enough to get you going.

I think this can be addressed by creating a single page or perhaps several
pages that together cover everything you need to perform a basic crawl:

[*] Configuring the data store
[**] HBase
[**] Cassandra
[**] MySQL
[*] General nutch 2 client configuration that are relevant to any store
[*] Crawling
[**] Crawling step by step (running each step seperatly)
[**] Performing a full crawl
[***] using the crawl script
[***] using the job file




On Wed, Jan 22, 2014 at 1:53 PM, Julien Nioche <
lists.digitalpeb...@gmail.com> wrote:

> Thanks Tejas!
>
>
> On 22 January 2014 11:51, Tejas Patil <tejas.patil...@gmail.com> wrote:
>
>> Moved the old nutchhadooptutorial page from Nutch wiki "Front page" to
>> "Archive and Legacy".
>>
>> ~tejas
>>
>>
>> On Wed, Jan 22, 2014 at 5:09 PM, Tejas Patil <tejas.patil...@gmail.com>wrote:
>>
>>> Thanks *Julien* for pointing me to new "NutchHadoopSingleNodeTutorial"
>>> wiki page [0]. I would soon remove the old nutchhadooptutorial page
>>> from wiki.
>>>
>>> [0] : http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
>>>
>>> *@d_k*, there are already tutorials for running Nutch 2.x. See [1] and
>>> [2]. Those are not as extensive as the tutorial for 1.x [3] but carry the
>>> steps which are different for 2.x. The rest steps after datastore setup are
>>> similar - the only difference being in the command params which can be
>>> figured out from the usage and so they were not duplicated in those 2.x
>>> tutorials to avoid maintenance overhead. Do you think that the 2.x
>>> tutorials are inadequate in some regards ?
>>>
>>> [1] : http://wiki.apache.org/nutch/Nutch2Tutorial
>>> [2] : http://wiki.apache.org/nutch/Nutch2Cassandra
>>> [3] : http://wiki.apache.org/nutch/NutchTutorial
>>>
>>> Thanks,
>>> Tejas
>>>
>>>
>>> On Wed, Jan 22, 2014 at 2:47 AM, d_k <mail...@gmail.com> wrote:
>>>
>>>> Actually what I would like to see is a Nutch 2.x tutorial at the same
>>>> level of detail as the http://wiki.apache.org/nutch/NutchHadoopTutorial
>>>> What is the process of contributing to that wiki page?
>>>>
>>>>
>>>> On Tue, Jan 21, 2014 at 9:33 PM, Julien Nioche <
>>>> lists.digitalpeb...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> The whole thing has been replaced with
>>>>>  
>>>>> http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial<http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial>which
>>>>>  does exactly what you described. +1 to remove the old
>>>>> nutchhadooptutorial page
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On 21 January 2014 17:44, Tejas Patil <tejas.patil...@gmail.com>wrote:
>>>>>
>>>>>> Hi nutch-dev,
>>>>>>
>>>>>> I was looking at [0] and realized that with the massive number of
>>>>>> Hadoop setup tutorials out there on internet, we need not repeat the same
>>>>>> on nutch wiki page and instead assume that user has already done Hadoop
>>>>>> setup. For convinience, we could direct users to the Hadoop wiki page 
>>>>>> which
>>>>>> has Hadoop setup details.
>>>>>> Plus, I propose following:
>>>>>>
>>>>>> - Section "Downloading Hadoop and Nutch" : Remove the Hadoop portions
>>>>>> and let the Nutch stuff stay.
>>>>>> - Section "Setting Up The Deployment Architecture" must be removed.
>>>>>> - Section "Deploy Nutch to Single Machine" and "Deploy Nutch to
>>>>>> Multiple Machines" can be merged together.
>>>>>> - Section "Performing a Nutch Crawl", "Testing the Crawl" and
>>>>>> "Performing a Search" must be merged, its contents must be updated.
>>>>>> - Section "Rsyncing Code to Slaves" and "Updates" can be completely
>>>>>> removed.
>>>>>>
>>>>>> Any comments ?
>>>>>>
>>>>>> [0] : http://wiki.apache.org/nutch/NutchHadoopTutorial
>>>>>>
>>>>>> Thanks,
>>>>>> Tejas
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Open Source Solutions for Text Engineering
>>>>>
>>>>> http://digitalpebble.blogspot.com/
>>>>> http://www.digitalpebble.com
>>>>> http://twitter.com/digitalpebble
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to