[jira] [Created] (SPARK-35592) Spark creates only "_SUCCESS" file after empty dataFrame is saved as parquet for partitioned data

VijayBhakuni (Jira) Tue, 01 Jun 2021 03:45:06 -0700

VijayBhakuni created SPARK-35592:
------------------------------------

             Summary: Spark creates only "_SUCCESS" file after empty dataFrame 
is saved as parquet for partitioned data
                 Key: SPARK-35592
                 URL: https://issues.apache.org/jira/browse/SPARK-35592
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.0
            Reporter: VijayBhakuni



Whenever an empty dataframe is saved as a parquet file with partitions, the 
target directory only contains _SUCCESS file.

Assuming, the dataframe has 3 columns:
 some_column_1, some_column_2, some_partition_column_1

and the target location for dataframe is /user/spark/df_name

*Current Result*:  /user/spark/df_name/_SUCCESS

*Expected Result*: 
/user/spark/df_name/some_partition_column_1=_HIVE_DEFAULT_PARTITION_/<some_spark_generated_file_name>.snappy.parquet

where that parquet file will have the schema for the data.

This approach makes sure that any job reading this data doesn't get failed due 
to:
Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema for 
Parquet. It must be specified manually.
 

*Steps for reproduce (Scala)*:

 
{code:java}
// create an empty DF with schema
val inputDF = Seq(
  ("value1", "value2", "partition1"),
  ("value3", "value4", "partition2"))
  .toDF("some_column_1", "some_column_2", "some_partition_column_1")
  .where("1==2")

// write dataframe into partitions
inputDF.write
  .partitionBy("some_partition_column_1")
  .mode(SaveMode.Overwrite)
  .parquet("/user/spark/df_name")


// Read dataframe
// Exception: org.apache.spark.sql.AnalysisException: Unable to infer schema 
for // Parquet. It must be specified manually.
val readDF = spark.read.parquet("/user/spark/df_name")
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-35592) Spark creates only "_SUCCESS" file after empty dataFrame is saved as parquet for partitioned data

Reply via email to