Hi All, I want to prepare a benchmark and presentation for my Spark Backend of Gora with help of Talat. I am planning to follow the approach of benchmarking for Spark by University of California, Berkeley [1][2].
Dimensions of my benchmark: * Hadoop Map/Reduce * Spark * Hadoop Map/Reduce via Gora * Spark via Gora For that aim, I would like to work on two types of dataset: 1) Data-intensive 2) CPU-intensive First of all, is there any benchmark which presents the performance effect of using Gora for Hadoop/MapReduce? Secondly, do you suggest any dataset (or tool) for my purposes (i.e. Logistic Regression, PageRank, TeraSort [3], Intel-Hadoop Benchmark[4], etc)? Kind Regards, Furkan KAMACI [1] https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf [2] http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf [3] http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ [4] https://github.com/intel-hadoop/HiBench

