I completely agree with Owen on this point. Let's move all discussions to dev lists and jira: http://issues.apache.org/jira/browse/HIVE-396
I was confused by seeing so many automatic emails in the dev mailing list. Zheng -----Original Message----- From: Owen O'Malley [mailto:owen.omal...@gmail.com] Sent: Friday, June 19, 2009 10:03 AM To: core-user@hadoop.apache.org; pig-u...@hadoop.apache.org; hive-u...@hadoop.apache.org Subject: Re: A simple performance benchmark for Hadoop, Hive and Pig On Thu, Jun 18, 2009 at 9:29 PM, Zheng Shao <zs...@facebook.com> wrote: > Yuntao Jia, our intern this summer, did a simple performance benchmark for > Hadoop, Hive and Pig based on the queries in the SIGMOD 2009 paper: A > Comparison of Approaches to Large-Scale Data Analysis It should be noted that no one on the Pig team was involved in setting up the benchmarks and the queries don't follow the Pig cookbook suggestions for writing efficient queries, so these results should be considered *extremely* preliminary. Furthermore, I can't see any way that Hive should be able to beat raw map/reduce, since Hive uses map/reduce to run the job. In the future, it would be better to involve the respective communities (mapreduce-dev and pig-dev) far before pushing benchmark results out to the user lists. The Hadoop project, which includes all three subprojects, needs to be a cooperative community that is trying to build the best software we can. Getting benchmark numbers is good, but are better done in a collaborative manner. -- Owen