Re: What is the best way to build my developing Spark for testing on EC2?

Evan Sparks Thu, 02 Oct 2014 17:54:06 -0700

I recommend using the data generators provided with MLlib to generate synthetic 
data for your scalability tests - provided they're well suited for your 
algorithms. They let you control things like number of examples and 
dimensionality of your dataset, as well as number of partitions.


As far as cluster set up goes, I usually launch spot instances with the 
spark-ec2 scripts, and then check out a repo which contains a simple driver 
application for my code. Then I have something crude like bash scripts running 
my program and collecting output. 

You could have a look at the spark-perf repo if you want something a little 
better principled/automatic. 

- Evan

> On Oct 2, 2014, at 5:37 PM, Yu Ishikawa <yuu.ishikawa+sp...@gmail.com> wrote:
> 
> Hi all, 
> 
> I am trying to contribute some machine learning algorithms to MLlib. 
> I must evaluate their performance on a cluster, changing input data 
> size, the number of CPU cores and any their parameters.
> 
> I would like to build my develoipng Spark on EC2 automatically. 
> Is there already a building script for a developing version like spark-ec2
> script?
> Or if you have any good idea to evaluate the performance of a developing 
> MLlib algorithm on a spark cluster like EC2, could you tell me?
> 
> Best,
> 
> 
> 
> -----
> -- Yu Ishikawa
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/What-is-the-best-way-to-build-my-developing-Spark-for-testing-on-EC2-tp8638.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: What is the best way to build my developing Spark for testing on EC2?

Reply via email to