I have some old data from a personal experiment. An year ago, CBayes model generation from a subset of wikipedia(3 GB out of 17GB) over 6 Pentium HT 3.0GHz cluster with 100mbps switched ethernet took 15 mins. An addition 5 mins was used to generated the 3 GB dataset from 17Gb bringing total time to 20mins approx.
Note that hadoop sorted 1TB using 4000 quadcore/duo core systems over gigabit/multigigabit connections. so there is no comparison. I hope this info helps Robin On Sun, Nov 22, 2009 at 12:26 PM, Jeff Zhang <[email protected]> wrote: > Maybe benchmark is what I like to know accurately, > > Just like hadoop has a benchmark that it can sort 1TB data in 62 seconds, so > the same, how much time will it take mahout's bayes algorithms to train a > model using data like 1GB? > > > Thank you > > Jeff Zhang > > > ---------- Forwarded message ---------- > From: Sean Owen <[email protected]> > Date: Sat, Nov 21, 2009 at 10:44 PM > Subject: Re: Is there performance comparison document ? > To: [email protected] > > > I think we can already state the answer though: it's going to take > much more CPU time and resources to run a computation via Hadoop than > run it completely on one machine (non-parallelized). Hadoop is a lot > of overhead. > > However some problems are too big to fit on one machine, so you have > to parallelize with Hadoop. In that case, there is no comparison -- > you can't run it without Hadoop. > > Also, parallelizing means you can finish the computation in fewer > wall-clock seconds. It'll take more CPU-seconds though. But then the > Hadoop runtime is just a function of how many machines you throw at it > and how parallelizable it is, so it's not much of a comparison. > > Are you wondering how much the overhead is, of a framework like Hadoop? > > On Sun, Nov 22, 2009 at 6:30 AM, Jeff Zhang <[email protected]> wrote: >> Hi all,, >> >> Since mahout is build upon hadoop, so is there any performance comparison >> between the algorithms using hadoop and without using hadoop. ? >> >> Thank you. >> >> Jeff Zhang >> >
