Max Gekk created SPARK-46890:
--------------------------------
Summary: CSV fails on a column with default and without enforcing
schema
Key: SPARK-46890
URL: https://issues.apache.org/jira/browse/SPARK-46890
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk
When we create a table using CSV on an existing file with a header and:
- a column has an default +
- enforceSchema is false - taking into account CSV header
The example below shows the issue:
{code:sql}
CREATE TABLE IF NOT EXISTS products (
product_id INT,
name STRING,
price FLOAT default 0.0,
quantity INT default 0
)
USING CSV
OPTIONS (
header 'true',
inferSchema 'false',
enforceSchema 'false',
path '/Users/maximgekk/tmp/products.csv'
);
{code}
The CSV file products.csv:
{code}
product_id,name,price,quantity
1,Apple,0.50,100
2,Banana,0.25,200
3,Orange,0.75,50
{code}
The query fails:
{code:sql}
spark-sql (default)> SELECT price FROM products;
24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
java.lang.IllegalArgumentException: Number of column in CSV header is not equal
to number of fields in the schema:
Header length: 4, schema size: 1
CSV file: file:///Users/maximgekk/tmp/products.csv
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]