Hi All,
PFB sample code ,
val df = spark.read.parquet(....)
df.registerTempTable("df")
val zip = df.select("zip_code").distinct().as[String].rdd
def comp(zipcode:String):Unit={
val zipval = "SELECT * FROM df WHERE
zip_code='$zipvalrepl'".replace("$zipvalrepl",
zipcode)
val data = spark.sql(zipval) //Throwing null pointer exception with RDD
data.write.parquet(......)
}
val sam = zip.map(x => comp(x))
sam.count
But when i do val zip = df.select("zip_code").distinct().as[String].rdd.collect
and call the function, then i get data computer, but in sequential order.
I would like to know, why when tried running map with rdd, i get null
pointer exception and is there a way to compute the comp function for each
zipcode in parallel ie run multiple zipcode at the same time.
Any clue or inputs are appreciated.
Regards.