RE: Hadoop for incremental index building and search

Stu Hood Sat, 27 Oct 2007 11:52:37 -0700

Your understanding is correct.

The MapReduce paradigm is designed for processing large batches of data in 
parallel. Additionally, startup time in Hadoop's implementation is rather 
costly. For jobs of a minute or less, more than 50% of the time spent will be 
occupied by startup.


If you have a dataset that you can afford to index in batches, then Hadoop is 
an excellent solution (as evidenced by Nutch).

Thanks,
Stu



-----Original Message-----
From: "Paul H." <[EMAIL PROTECTED]>
Sent: Saturday, October 27, 2007 12:19pm
To: [email protected]
Subject: Hadoop for incremental index building and search

Hello,

According to a hadoop tutorial  
(http://wiki.apache.org/nutch/NutchHadoopTutorial) on wiki,

"you don't want to search using DFS, you want to search using local filesystems.
Once the index has been created on the DFS you can use the hadoop
copyToLocal command to move it to the local file system as such".

So
my understanding is that hadoop is only good for batch index building,
and is not proper for incremental index building and search. Is this
true? By "incremental index building and search", I mean a system that
accepts text on the fly, builds index  and makes the index available
for search immediately. 

cheers,
Paul





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

RE: Hadoop for incremental index building and search

Reply via email to