[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812034#comment-17812034 ]
Daniel edited comment on SPARK-46890 at 1/29/24 7:31 PM: --------------------------------------------------------- This unit test does not seem to reproduce the problem: {code:java} test("SPARK-46890: CSV fails on a column with default and without enforcing schema") { withTable("CarsTable") { spark.sql( s""" |CREATE TABLE CarsTable( | year INT, | make STRING, | model STRING, | comment STRING DEFAULT '', | blank STRING DEFAULT '') |USING csv |OPTIONS ( | header "true", | inferSchema "false", | enforceSchema "false", | path "${testFile(carsFile)}" |) """.stripMargin) checkAnswer( spark.table("CarsTable"), Seq( Row(2012, "Tesla", "S", "No comment", null), Row(1997, "Ford", "E350", "Go get one now they are going fast", null), Row(2015, "Chevy", "Volt", "", "") )) } } {code} With the "cars.csv" file containing: {code:java} year,make,model,comment,blank "2012","Tesla","S","No comment", 1997,Ford,E350,"Go get one now they are going fast", 2015,Chevy,Volt {code} Will look further. was (Author: JIRAUSER285772): This unit test does not seem to reproduce the problem: {code:java} test("SPARK-46890: CSV fails on a column with default and without enforcing schema") { withTable("CarsTable") { spark.sql( s""" |CREATE TABLE CarsTable( | year INT, | make STRING, | model STRING, | comment STRING DEFAULT '', | blank STRING DEFAULT '') |USING csv |OPTIONS ( | header "true", | inferSchema "false", | enforceSchema "false", | path "${testFile(carsFile)}" |) """.stripMargin) checkAnswer( spark.table("CarsTable"), Seq( Row(2012, "Tesla", "S", "No comment", null), Row(1997, "Ford", "E350", "Go get one now they are going fast", null), Row(2015, "Chevy", "Volt", "", "") )) } } {code} With the "cars.csv" file containing: {code:java} year,make,model,comment,blank "2012","Tesla","S","No comment", 1997,Ford,E350,"Go get one now they are going fast", 2015,Chevy,Volt {code} Will look further. > CSV fails on a column with default and without enforcing schema > --------------------------------------------------------------- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0 > Reporter: Max Gekk > Priority: Major > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org