Maybe benchmark is what I like to know accurately,

Just like hadoop has a benchmark that it can sort 1TB data in 62 seconds, so
the same, how much time will it take mahout's bayes algorithms to train a
model using data like 1GB?


Thank you

Jeff Zhang


---------- Forwarded message ----------
From: Sean Owen <[email protected]>
Date: Sat, Nov 21, 2009 at 10:44 PM
Subject: Re: Is there performance comparison document ?
To: [email protected]


I think we can already state the answer though: it's going to take
much more CPU time and resources to run a computation via Hadoop than
run it completely on one machine (non-parallelized). Hadoop is a lot
of overhead.

However some problems are too big to fit on one machine, so you have
to parallelize with Hadoop. In that case, there is no comparison --
you can't run it without Hadoop.

Also, parallelizing means you can finish the computation in fewer
wall-clock seconds. It'll take more CPU-seconds though. But then the
Hadoop runtime is just a function of how many machines you throw at it
and how parallelizable it is, so it's not much of a comparison.

Are you wondering how much the overhead is, of a framework like Hadoop?

On Sun, Nov 22, 2009 at 6:30 AM, Jeff Zhang <[email protected]> wrote:
> Hi all,,
>
> Since mahout is build upon hadoop, so is there any performance comparison
> between the algorithms using hadoop and without using hadoop. ?
>
> Thank you.
>
> Jeff Zhang
>

Reply via email to