Actually, the aim of my study is to determine if Hadoop can be used to
parallelize a Java software that uses Lucene API. According to what I've
read and understood, Nutch is "only" an implementation of a web searcher
using Lucene, and Hadoop is the parallel file system developped to face
the huge need of computation power that requires such a software...
If Hadoop can't be used without Nutch, then I don't understad why it's
been separated in a side project ?
You quote Nutch as being "the networked version of Lucene", but from
what I've seen it's more precise than that, especially designed to deal
with web documents... am I wrong assuming this ?
I'm quite confused now actually...
This is a very strange project you are working on.
The reason that it is strange is that Hadoop grew out of the networked
version of Lucene that was known as Nutch.
That should mean that you don't need to do anything at all to get lucene and
Hadoop to work together. Just run Nutch.
On 7/18/07 7:29 AM, "Samuel LEMOINE" <[EMAIL PROTECTED]> wrote:
Hi Hadoopers !
I'm working on Hadoop for an internship, trying to find out its
possibilities in use with Lucene... my problem is that I'v been reading
loads of docs for a week or so, such as GoogleMapReduce &
GoogleFileSystem documentations, HadoopFileSystem and HadoopMapReduce
documentation, and I also glanced at the API javadoc... but all these
docs seems quite arcane to me.
Would anyone give me a track, a tutorial about beginning with Hadoop,
how to interface it with a Lucene software ? I've already searched in
Hadoop's wiki and Hadoop's website documentation, without any clear
success... any help will be greatly appreciated :)
thanks,
Samuel