Hello spark community.
I wanted to ask if any work has been done on porting TeraSort (Tera
Gen/Sort/Validate) from Hadoop to Spark on EC2/EMR
I am looking for some guidance on lessons learned from this or similar efforts
as we are trying to do some benchmarking on some of the newer EC2 instances to
determine how to optimize in-memory processing of these instances with Spark
for some of AWS' customers looking to move to Spark for their data processing
workloads.
Any guidance the community can provide on this effort is greatly appreciated!
Thanks,
Dario Rivera
Solutions Architect
Cell: 571-205-2731
Email: dar...@amazon.commailto:dar...@amazon.com
[AWS Graphic]
inline: image003.jpg