On Tue, Mar 11, 2008 at 5:18 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Yes. Each task is launching a JVM. Guess that would explain the slowness :) Is HDFS tuned similarly? We're thinking of possibly distributing our data using HDFS but storing a sufficiently small amount of data per node so that the linux kernel could buffer it all into memory. Is there much overhead in grabbing data from HDFS if that data is stored locally? Map reduce is not generally useful for real-time applications. It is VERY > useful for large scale data reductions done in advance of real-time > operations. > > The basic issue is that the major performance contribution of map-reduce > architectures is large scale sequential access of data stores. That is > pretty much in contradiction with real-time response. > Gotcha. We'll consider switching to a batch-style approach, which it sounds like Hadoop would be perfect for. Thanks, Jason
