Paride Casulli created SPARK-45908: -------------------------------------- Summary: write empty parquet file while using partitioned write Key: SPARK-45908 URL: https://issues.apache.org/jira/browse/SPARK-45908 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Paride Casulli
Hi, I'm currently using pyspark and if I try to write an empty dataframe in parquet file in a partitioned way no file is written in the target folder df.write.mode("overwrite").partitionBy("BUSINESS_DATE").parquet("/data_dir/"+stg+"/ISS/exchange/WORK_ISSR_EOD_EXT_SETTLEMENT_CA_"+se) this creates a problem because I have another job which reads the file and can't infer the schema and raises an error. I made a workaround in this way: #implemented to manage empty data also def write_partitioned_df(df,partition_col,partition_val,save_path): df.write.mode("overwrite").partitionBy(partition_col).parquet(save_path) if df.isEmpty(): df=df.drop(partition_col) df.write.mode("overwrite").parquet(save_path+"/"+partition_col+"="+partition_val) in order to write an empty parquet in the target folder but It would be great to have an option in the write function to avoid this custom implementation. I see other users interested in this feature asking on StackOverflow. Thank you very much Paride -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org