dqkqd commented on PR #17796:
URL: https://github.com/apache/datafusion/pull/17796#issuecomment-3341017167
The test failed. It ensures an empty table should have its columns infer as
`Uft8`.
DuckDB does the same so I think this is correct.
```bash
D CREATE TABLE empty AS
SELECT * FROM read_csv_auto('empty.csv');
D select * from empty;
┌─────────┬─────────┬─────────┐
│ c1 │ c2 │ c3 │
│ varchar │ varchar │ varchar │
├─────────┴─────────┴─────────┤
│ 0 rows │
└─────────────────────────────┘
```
When I check how DuckDB handles table with null columns, it infer those
columns as `VARCHAR`.
```bash
D CREATE TABLE has_nulls_column AS
SELECT * FROM read_csv_auto('has_nulls_column.csv');
D select * from has_nulls_column;
┌───────┬───────┬─────────┐
│ c1 │ c2 │ c3 │
│ int64 │ int64 │ varchar │
├───────┼───────┼─────────┤
│ 1 │ 2 │ NULL │
│ 3 │ 4 │ NULL │
└───────┴───────┴─────────┘
```
However, datafusion infers those as `Null`. I think we should change them to
`Utf8`.
```bash
> CREATE EXTERNAL TABLE has_nulls_column STORED AS CSV LOCATION
'has_nulls_column.csv' OPTIONS ('format.has_header' 'true');
0 row(s) fetched.
Elapsed 0.025 seconds
> select column_name, data_type, ordinal_position from
information_schema.columns where table_name='has_nulls_column';
+-------------+-----------+------------------+
| column_name | data_type | ordinal_position |
+-------------+-----------+------------------+
| c1 | Int64 | 0 |
| c2 | Int64 | 1 |
| c3 | Null | 2 |
+-------------+-----------+------------------+
3 row(s) fetched.
Elapsed 0.010 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]