The more Spark code I write, the more I hit the same use cases where the
Scala APIs feel a bit awkward. I'd love to understand if there are
historical reasons for these and whether there is opportunity + interest to
improve the APIs. Here are my top two:
1. registerTempTable() returns Unit
def cachedDF(path: String, tableName: String) = {  val df =
sqlContext.read.load(path).cache()  df.registerTempTable(tableName)  df}//
vs.def cachedDF(path: String, tableName: String) = 
sqlContext.read.load(path).cache().registerTempTable(tableName)
2. No toDF() implicit for creating a DataFrame from an RDD + schema
val schema: StructType = ...val rdd = sc.textFile(...)  .map(...) 
.aggregate(...)val df = sqlContext.createDataFrame(rdd, schema)// vs.val
schema: StructType = ...val df = sc.textFile(...)  .map(...) 
.aggregate(...)  .toDF(schema)
Have you encountered other examples where small, low-risk API tweaks could
make common use cases more consistent + simpler to code?
/Sim



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-API-simplifying-common-patterns-tp16238.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Reply via email to