[jira] [Commented] (SPARK-26208) Empty dataframe does not roundtrip for csv with header

Ranga Reddy (Jira) Mon, 09 Aug 2021 07:34:07 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-26208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17396076#comment-17396076
 ]


Ranga Reddy commented on SPARK-26208:
-------------------------------------

The above code will work only when dataframe created manually.

Issue still persists when when we create dataframe while reading hive table.

*Hive Table:*
{code:java}
CREATE EXTERNAL TABLE `test_empty_csv_table`( 
 `col1` bigint, 
 `col2` bigint) 
STORED AS ORC 
LOCATION '/tmp/test_empty_csv_table';{code}
*spark-shell*

 
{code:java}
val tableName = "test_empty_csv_table"
val emptyCSVFilePath = "/tmp/empty_csv_file"
val df = spark.sql("select * from "+tableName)
df.printSchema()
df.write.format("csv").option("header", 
true).mode("overwrite").save(emptyCSVFilePath)
val df2 = spark.read.option("header", true).csv(emptyCSVFilePath)
{code}
 
{code:java}
org.apache.spark.sql.AnalysisException: Unable to infer schema for CSV. It must 
be specified manually.;
 at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:208)
 at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:208)
 at scala.Option.getOrElse(Option.scala:121)
 at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:207)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:393)
 at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
 at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:596)
 at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:473)
 ... 49 elided{code}

> Empty dataframe does not roundtrip for csv with header
> ------------------------------------------------------
>
>                 Key: SPARK-26208
>                 URL: https://issues.apache.org/jira/browse/SPARK-26208
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>         Environment: master branch,
> commit 034ae305c33b1990b3c1a284044002874c343b4d,
> date:   Sun Nov 18 16:02:15 2018 +0800
>            Reporter: koert kuipers
>            Assignee: Koert Kuipers
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> when we write empty part file for csv and header=true we fail to write 
> header. the result cannot be read back in.
> when header=true a part file with zero rows should still have header



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-26208) Empty dataframe does not roundtrip for csv with header

Reply via email to