Furcy Pin created SPARK-31657:
---------------------------------
Summary: CSV Writer writes no header for empty DataFrames
Key: SPARK-31657
URL: https://issues.apache.org/jira/browse/SPARK-31657
Project: Spark
Issue Type: Bug
Components: Input/Output
Affects Versions: 2.4.1
Environment: Local pyspark 2.41
Reporter: Furcy Pin
When writing a DataFrame as csv with the Header option set to true,
the header is not written when the DataFrame is empty.
This creates failures for processes that read the csv back.
Example (please notice the limit(0) in the second example):
```
{code:java}
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.1
/_/
Using Python version 2.7.17 (default, Nov 7 2019 10:07:09)
SparkSession available as 'spark'.
>>> df1 = spark.sql("SELECT 1 as a")
>>> df1.limit(1).write.mode("OVERWRITE").option("Header",
>>> True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
+---+
| a|
+---+
| 1|
+---+
>>>
>>> df1.limit(0).write.mode("OVERWRITE").option("Header",
>>> True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
++
||
++
++
{code}
Expected behavior:
{code:java}
>>> df1.limit(0).write.mode("OVERWRITE").option("Header",
>>> True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
+---+
| a|
+---+
+---+{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]