Hey Patrick, It's Ozgun from Citus Data. We'd like to make these benchmark results fair, and have tried different config settings for SparkSQL over the past month. We picked the best config settings we could find, and also contacted the Spark users list about running TPC-H numbers.
http://goo.gl/IU5Hw0 http://goo.gl/WQ1kML http://goo.gl/ihLzgh We also received advice at the Spark Summit '14 to wait until v1.1, and therefore re-ran our tests on SparkSQL 1.1. On the specific optimizations, Marco and Samay from our team have much more context, and I'll let them answer your questions on the different settings we tried. Our intent is to be fair and not misrepresent SparkSQL's performance. On that front, we used publicly available documentation and user lists, and spent about a month trying to get the best Spark performance results. If there are specific optimizations we should have applied and missed, we'd love to be involved with the community in re-running the numbers. Is this email thread the best place to continue the conversation? Best, Ozgun -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Surprising-Spark-SQL-benchmark-tp9041p9073.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org