Hi All,

I want to prepare a benchmark and presentation for my Spark Backend of Gora
with help of Talat. I am planning to follow the approach of benchmarking
for Spark by University of California, Berkeley [1][2].

Dimensions of my benchmark:

* Hadoop Map/Reduce
* Spark
* Hadoop Map/Reduce via Gora
* Spark via Gora

For that aim, I would like to work on two types of dataset:

1) Data-intensive
2) CPU-intensive

First of all, is there any benchmark which presents the performance effect
of using Gora for Hadoop/MapReduce?

Secondly, do you suggest any dataset (or tool) for my purposes (i.e.
Logistic Regression, PageRank, TeraSort [3], Intel-Hadoop Benchmark[4],
etc)?


Kind Regards,
Furkan KAMACI

[1] https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
[2] http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf
[3]
http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
[4] https://github.com/intel-hadoop/HiBench

Reply via email to