Re: lucene with hadoop but without nutch, looking for documentation

Samuel LEMOINE Wed, 18 Jul 2007 08:33:00 -0700

Actually, the aim of my study is to determine if Hadoop can be used toparallelize a Java software that uses Lucene API. According to what I'veread and understood, Nutch is "only" an implementation of a web searcherusing Lucene, and Hadoop is the parallel file system developped to facethe huge need of computation power that requires such a software...If Hadoop can't be used without Nutch, then I don't understad why it'sbeen separated in a side project ?You quote Nutch as being "the networked version of Lucene", but fromwhat I've seen it's more precise than that, especially designed to dealwith web documents... am I wrong assuming this ?

I'm quite confused now actually...

This is a very strange project you are working on.


The reason that it is strange is that Hadoop grew out of the networked
version of Lucene that was known as Nutch.

That should mean that you don't need to do anything at all to get lucene and
Hadoop to work together.  Just run Nutch.


On 7/18/07 7:29 AM, "Samuel LEMOINE" <[EMAIL PROTECTED]> wrote:

Hi Hadoopers !

I'm working on Hadoop for an internship, trying to find out its
possibilities in use with Lucene... my problem is that I'v been reading
loads of docs for a week or so, such as GoogleMapReduce &
GoogleFileSystem documentations, HadoopFileSystem and HadoopMapReduce
documentation, and I also glanced at the API javadoc... but all these
docs seems quite arcane to me.
Would anyone give me a track, a tutorial about beginning with Hadoop,
how to interface it with a Lucene software ? I've already searched in
Hadoop's wiki and Hadoop's website documentation, without any clear
success... any help will be greatly appreciated :)

thanks,

Samuel

Re: lucene with hadoop but without nutch, looking for documentation

Reply via email to