[
https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812034#comment-17812034
]
Daniel edited comment on SPARK-46890 at 1/29/24 7:31 PM:
---------------------------------------------------------
This unit test does not seem to reproduce the problem:
{code:java}
test("SPARK-46890: CSV fails on a column with default and without enforcing
schema") {
withTable("CarsTable") {
spark.sql(
s"""
|CREATE TABLE CarsTable(
| year INT,
| make STRING,
| model STRING,
| comment STRING DEFAULT '',
| blank STRING DEFAULT '')
|USING csv
|OPTIONS (
| header "true",
| inferSchema "false",
| enforceSchema "false",
| path "${testFile(carsFile)}"
|)
""".stripMargin)
checkAnswer(
spark.table("CarsTable"),
Seq(
Row(2012, "Tesla", "S", "No comment", null),
Row(1997, "Ford", "E350", "Go get one now they are going fast", null),
Row(2015, "Chevy", "Volt", "", "")
))
}
} {code}
With the "cars.csv" file containing:
{code:java}
year,make,model,comment,blank
"2012","Tesla","S","No comment",
1997,Ford,E350,"Go get one now they are going fast",
2015,Chevy,Volt
{code}
Will look further.
was (Author: JIRAUSER285772):
This unit test does not seem to reproduce the problem:
{code:java}
test("SPARK-46890: CSV fails on a column with default and without enforcing
schema") {
withTable("CarsTable") {
spark.sql(
s"""
|CREATE TABLE CarsTable(
| year INT,
| make STRING,
| model STRING,
| comment STRING DEFAULT '',
| blank STRING DEFAULT '')
|USING csv
|OPTIONS (
| header "true",
| inferSchema "false",
| enforceSchema "false",
| path "${testFile(carsFile)}"
|)
""".stripMargin)
checkAnswer(
spark.table("CarsTable"),
Seq(
Row(2012, "Tesla", "S", "No comment", null),
Row(1997, "Ford", "E350", "Go get one now they are going fast", null),
Row(2015, "Chevy", "Volt", "", "")
))
}
} {code}
With the "cars.csv" file containing:
{code:java}
year,make,model,comment,blank
"2012","Tesla","S","No comment",
1997,Ford,E350,"Go get one now they are going fast",
2015,Chevy,Volt
{code}
Will look further.
> CSV fails on a column with default and without enforcing schema
> ---------------------------------------------------------------
>
> Key: SPARK-46890
> URL: https://issues.apache.org/jira/browse/SPARK-46890
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Max Gekk
> Priority: Major
>
> When we create a table using CSV on an existing file with a header and:
> - a column has an default +
> - enforceSchema is false - taking into account CSV header
> then query a column with a default.
> The example below shows the issue:
> {code:sql}
> CREATE TABLE IF NOT EXISTS products (
> product_id INT,
> name STRING,
> price FLOAT default 0.0,
> quantity INT default 0
> )
> USING CSV
> OPTIONS (
> header 'true',
> inferSchema 'false',
> enforceSchema 'false',
> path '/Users/maximgekk/tmp/products.csv'
> );
> {code}
> The CSV file products.csv:
> {code:java}
> product_id,name,price,quantity
> 1,Apple,0.50,100
> 2,Banana,0.25,200
> 3,Orange,0.75,50
> {code}
> The query fails:
> {code:sql}
> spark-sql (default)> SELECT price FROM products;
> 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
> java.lang.IllegalArgumentException: Number of column in CSV header is not
> equal to number of fields in the schema:
> Header length: 4, schema size: 1
> CSV file: file:///Users/maximgekk/tmp/products.csv
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]