Re: New to Nutch2.x

Sebastian Nagel Thu, 24 Mar 2016 04:25:26 -0700

Hi,

Nutch 2.x is different because it makes use of external data stores
to hold all crawled data.

However, the steps to run a crawl are the same:
 inject
 loop: generate, fetch, parse, update
 invert links, index
Only the arguments passed to run each step may be different.

Have a look at:
 https://wiki.apache.org/nutch/Nutch2Tutorial
and
 the bin/crawl script
which is provided for both 1.x and 2.x
The differences in should be obvious.

But may I ask, why you do not keep going to use Nutch 1.x
which is still maintained, in some respects even better
than 2.x?

Cheers,
Sebastian

On 03/23/2016 06:06 PM, Sabah Sajjad Khan wrote:
> Hello, 
> 
> 
> We worked with nutch1.x for a project and were able to successfully crawl the 
> way we want. Our
> project now requires us to use nutch2.x and we seem to see a lack of 
> documentation to help. We are
> able to inject but have no idea what to do next. Is it the same as nutch1.x? 
> Any help would be
> appreciated as we are students and have been struggling for a good 2 months 
> now!
> 
> 
> Thank you
>

Re: New to Nutch2.x

Reply via email to