Your sample codes first select distinct zipcodes, and then save the rows of
each distinct zipcode into a parquet file.

So I think you can simply partition your data by using
`DataFrameWriter.partitionBy` API, e.g.,

df.repartition("zip_code").write.partitionBy("zip_code").parquet(.....)




-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Null-pointer-exception-with-RDD-while-computing-a-method-creating-dataframe-tp20308p20328.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to