Hi there, I noticed in the latest Spark SQL programming guide <https://spark.apache.org/docs/latest/sql-programming-guide.html> , there is support for optimized reading of partitioned Parquet files that have a particular directory structure (year=1/month=10/day=3, for example). However, I see no analogous way to write DataFrames as Parquet files with similar directory structures based on user-provided partitioning.
Generally, is it possible to write DataFrames as partitioned Parquet files that downstream partition discovery can take advantage of later? I considered extending the Parquet output format, but it looks like ParquetTableOperations.scala has fixed the output format to AppendingParquetOutputFormat. Also, I was wondering if it would be valuable to contribute writing Parquet in partition directories as a PR. Thanks, -Matt Cheah
smime.p7s
Description: S/MIME cryptographic signature