Re: Starting a new Spark codebase, Python or Scala / Java?

2016-11-21 Thread Anthony May
A sensible default strategy is to use the same language in which a system was developed or a highly compatible language. That would be Scala for Spark, however I assume you don't currently know Scala to the same degree as Python or at all. In which case to help you make the decision you should

Re: Starting a new Spark codebase, Python or Scala / Java?

2016-11-21 Thread Jon Gregg
Spark is written in Scala, so yes it's still the strongest option. You also get the Dataset type with Scala (compile time type-safety), and that's not an available feature with Python. That said, I think the Python API is a viable candidate if you use Pandas for Data Science. There are

Starting a new Spark codebase, Python or Scala / Java?

2016-11-21 Thread Brandon White
Hello all, I will be starting a new Spark codebase and I would like to get opinions on using Python over Scala. Historically, the Scala API has always been the strongest interface to Spark. Is this still true? Are there still many benefits and additional features in the Scala API that are not