[ https://issues.apache.org/jira/browse/SPARK-46890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812040#comment-17812040 ]
Daniel commented on SPARK-46890: -------------------------------- I tried the exact command in the bug description, and it doesn't cause any errors on the master branch: {code:java} withTable("Products") { spark.sql( s""" |CREATE TABLE IF NOT EXISTS Products ( | product_id INT, | name STRING, | price FLOAT default 0.0, | quantity INT default 0 |) |USING CSV |OPTIONS ( | header 'true', | inferSchema 'false', | enforceSchema 'false', | path "${testFile(productsFile)}" |) """.stripMargin) checkAnswer( spark.table("Products"), Seq( Row(1, "Apple", 0.50, 100), Row(2, "Banana", 0.25, 200), Row(3, "Orange", 0.75, 50))) } {code} With the "products.csv" file containing: {code:java} product_id,name,price,quantity 1,Apple,0.50,100 2,Banana,0.25,200 3,Orange,0.75,50 {code} This unit test passes. > CSV fails on a column with default and without enforcing schema > --------------------------------------------------------------- > > Key: SPARK-46890 > URL: https://issues.apache.org/jira/browse/SPARK-46890 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0 > Reporter: Max Gekk > Priority: Major > > When we create a table using CSV on an existing file with a header and: > - a column has an default + > - enforceSchema is false - taking into account CSV header > then query a column with a default. > The example below shows the issue: > {code:sql} > CREATE TABLE IF NOT EXISTS products ( > product_id INT, > name STRING, > price FLOAT default 0.0, > quantity INT default 0 > ) > USING CSV > OPTIONS ( > header 'true', > inferSchema 'false', > enforceSchema 'false', > path '/Users/maximgekk/tmp/products.csv' > ); > {code} > The CSV file products.csv: > {code:java} > product_id,name,price,quantity > 1,Apple,0.50,100 > 2,Banana,0.25,200 > 3,Orange,0.75,50 > {code} > The query fails: > {code:sql} > spark-sql (default)> SELECT price FROM products; > 24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6) > java.lang.IllegalArgumentException: Number of column in CSV header is not > equal to number of fields in the schema: > Header length: 4, schema size: 1 > CSV file: file:///Users/maximgekk/tmp/products.csv > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org