Your understanding is correct. The MapReduce paradigm is designed for processing large batches of data in parallel. Additionally, startup time in Hadoop's implementation is rather costly. For jobs of a minute or less, more than 50% of the time spent will be occupied by startup.
If you have a dataset that you can afford to index in batches, then Hadoop is an excellent solution (as evidenced by Nutch). Thanks, Stu -----Original Message----- From: "Paul H." <[EMAIL PROTECTED]> Sent: Saturday, October 27, 2007 12:19pm To: [email protected] Subject: Hadoop for incremental index building and search Hello, According to a hadoop tutorial (http://wiki.apache.org/nutch/NutchHadoopTutorial) on wiki, "you don't want to search using DFS, you want to search using local filesystems. Once the index has been created on the DFS you can use the hadoop copyToLocal command to move it to the local file system as such". So my understanding is that hadoop is only good for batch index building, and is not proper for incremental index building and search. Is this true? By "incremental index building and search", I mean a system that accepts text on the fly, builds index and makes the index available for search immediately. cheers, Paul __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
