Maybe benchmark is what I like to know accurately, Just like hadoop has a benchmark that it can sort 1TB data in 62 seconds, so the same, how much time will it take mahout's bayes algorithms to train a model using data like 1GB?
Thank you Jeff Zhang ---------- Forwarded message ---------- From: Sean Owen <[email protected]> Date: Sat, Nov 21, 2009 at 10:44 PM Subject: Re: Is there performance comparison document ? To: [email protected] I think we can already state the answer though: it's going to take much more CPU time and resources to run a computation via Hadoop than run it completely on one machine (non-parallelized). Hadoop is a lot of overhead. However some problems are too big to fit on one machine, so you have to parallelize with Hadoop. In that case, there is no comparison -- you can't run it without Hadoop. Also, parallelizing means you can finish the computation in fewer wall-clock seconds. It'll take more CPU-seconds though. But then the Hadoop runtime is just a function of how many machines you throw at it and how parallelizable it is, so it's not much of a comparison. Are you wondering how much the overhead is, of a framework like Hadoop? On Sun, Nov 22, 2009 at 6:30 AM, Jeff Zhang <[email protected]> wrote: > Hi all,, > > Since mahout is build upon hadoop, so is there any performance comparison > between the algorithms using hadoop and without using hadoop. ? > > Thank you. > > Jeff Zhang >
