Your sample codes first select distinct zipcodes, and then save the rows of each distinct zipcode into a parquet file.
So I think you can simply partition your data by using `DataFrameWriter.partitionBy` API, e.g., df.repartition("zip_code").write.partitionBy("zip_code").parquet(.....) ----- Liang-Chi Hsieh | @viirya Spark Technology Center http://www.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Null-pointer-exception-with-RDD-while-computing-a-method-creating-dataframe-tp20308p20328.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org