Not sure if comparing Hadoop to databases is an apples to apples comparison. Hadoop is a complete job execution framework, which collocates the data with the computation. I suppose DBMS-X and Vertica do that to some certain extent, by way of SQL, but you're restricted to that. If you want to say, build a distributed web crawler, or a complex data processing pipeline, Hadoop will schedule those processes across a cluster for you, while Vertica and DBMS-X only deal with the storage of the data.
The choice of experiments seemed skewed towards DBMS-X and Vertica. I think everybody is aware that Map-Reduce is inefficient for handling SQL-like queries and joins. It's also worth noting that I think 4 out of the 7 authors either currently or at one time work with Vertica (or c-store, the precursor to Vertica). Andy On Tue, Apr 14, 2009 at 10:16 AM, Guilherme Germoglio <[email protected]>wrote: > (Hadoop is used in the benchmarks) > > http://database.cs.brown.edu/sigmod09/ > > There is currently considerable enthusiasm around the MapReduce > (MR) paradigm for large-scale data analysis [17]. Although the > basic control flow of this framework has existed in parallel SQL > database management systems (DBMS) for over 20 years, some > have called MR a dramatically new computing model [8, 17]. In > this paper, we describe and compare both paradigms. Furthermore, > we evaluate both kinds of systems in terms of performance and de- > velopment complexity. To this end, we define a benchmark con- > sisting of a collection of tasks that we have run on an open source > version of MR as well as on two parallel DBMSs. For each task, > we measure each system’s performance for various degrees of par- > allelism on a cluster of 100 nodes. Our results reveal some inter- > esting trade-offs. Although the process to load data into and tune > the execution of parallel DBMSs took much longer than the MR > system, the observed performance of these DBMSs was strikingly > better. We speculate about the causes of the dramatic performance > difference and consider implementation concepts that future sys- > tems should take from both kinds of architectures. > > > -- > Guilherme > > msn: [email protected] > homepage: http://germoglio.googlepages.com >
