Paride Casulli created SPARK-45908:
--------------------------------------

             Summary: write empty parquet file while using partitioned write
                 Key: SPARK-45908
                 URL: https://issues.apache.org/jira/browse/SPARK-45908
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Paride Casulli


Hi,

I'm currently using pyspark and if I try to write an empty dataframe in parquet 
file in a partitioned way no file is written in the target folder

df.write.mode("overwrite").partitionBy("BUSINESS_DATE").parquet("/data_dir/"+stg+"/ISS/exchange/WORK_ISSR_EOD_EXT_SETTLEMENT_CA_"+se)

 

this creates a problem because I have another job which reads the file and 
can't infer the schema and raises an error. I made a workaround in this way:

#implemented to manage empty data also
def write_partitioned_df(df,partition_col,partition_val,save_path):
    df.write.mode("overwrite").partitionBy(partition_col).parquet(save_path)
    if df.isEmpty():
        df=df.drop(partition_col)
        
df.write.mode("overwrite").parquet(save_path+"/"+partition_col+"="+partition_val)

 

in order to write an empty parquet in the target folder but It would be great 
to have an option in the write function to avoid this custom implementation. I 
see other users interested in this feature asking on StackOverflow.

 

Thank you very much

Paride

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to