alamb opened a new issue, #17517: URL: https://github.com/apache/datafusion/issues/17517
### Describe the bug I was playing around with the Datafusion CSV parser by using the example from https://duckdb.org/2025/09/08/duckdb-on-the-framework-laptop-13 but DataFusion refused to load it into parquet ### To Reproduce Get the data ```shell wget https://blobs.duckdb.org/nl-railway/railway-services-80-months.zip unzip railway-services-80-months.zip ``` Then run ```shell mkdir services-parquet datafusion-cli ``` Convert each file to parquet: ```sql COPY 'services/services-2019.csv' TO 'services-parquet/services-2019.parquet'; COPY 'services/services-2020.csv' TO 'services-parquet/services-2020.parquet'; COPY 'services/services-2021.csv' TO 'services-parquet/services-2021.parquet'; COPY 'services/services-2022.csv' TO 'services-parquet/services-2022.parquet'; COPY 'services/services-2023.csv' TO 'services-parquet/services-2023.parquet'; COPY 'services/services-2024.csv' TO 'services-parquet/services-2024.parquet'; COPY 'services/services-2025-01.csv' TO 'services-parquet/services-2025-01.parquet'; COPY 'services/services-2025-02.csv' TO 'services-parquet/services-2025-02.parquet'; COPY 'services/services-2025-03.csv' TO 'services-parquet/services-2025-03.parquet'; COPY 'services/services-2025-04.csv' TO 'services-parquet/services-2025-04.parquet'; COPY 'services/services-2025-05.csv' TO 'services-parquet/services-2025-05.parquet'; COPY 'services/services-2025-06.csv' TO 'services-parquet/services-2025-07.parquet'; COPY 'services/services-2025-07.csv' TO 'services-parquet/services-2025-07.parquet'; COPY 'services/services-2025-08.csv' TO 'services-parquet/services-2025-08.parquet'; ``` And then run ```sql DataFusion CLI v49.0.2 > select * from 'services-parquet' limit 10; Arrow error: Schema error: Fail to merge schema field 'Stop:Arrival time' because the from data_type = Timestamp(Second, None) does not equal Utf8 ``` ### Expected behavior I expect to be able to read the data corrrectly ### Additional context One error is that the the type of the `Stop: ArrivalTime` has been converted to something different in some of the different files. Sometimes it is a timestamp and sometimes a string: ```sql > describe 'services-parquet/services-2020.parquet'; +------------------------------+-----------+-------------+ | column_name | data_type | is_nullable | +------------------------------+-----------+-------------+ | Service:RDT-ID | Int64 | YES | | Service:Date | Date32 | YES | | Service:Type | Utf8View | YES | | Service:Company | Utf8View | YES | | Service:Train number | Int64 | YES | | Service:Completely cancelled | Boolean | YES | | Service:Partly cancelled | Boolean | YES | | Service:Maximum delay | Int64 | YES | | Stop:RDT-ID | Int64 | YES | | Stop:Station code | Utf8View | YES | | Stop:Station name | Utf8View | YES | | Stop:Arrival time | Utf8View | YES | | Stop:Arrival delay | Utf8View | YES | | Stop:Arrival cancelled | Utf8View | YES | | Stop:Departure time | Utf8View | YES | | Stop:Departure delay | Utf8View | YES | | Stop:Departure cancelled | Utf8View | YES | +------------------------------+-----------+-------------+ 17 row(s) fetched. Elapsed 0.009 seconds. > describe 'services-parquet/services-2021.parquet'; +------------------------------+-------------------------+-------------+ | column_name | data_type | is_nullable | +------------------------------+-------------------------+-------------+ | Service:RDT-ID | Int64 | YES | | Service:Date | Date32 | YES | | Service:Type | Utf8View | YES | | Service:Company | Utf8View | YES | | Service:Train number | Int64 | YES | | Service:Completely cancelled | Boolean | YES | | Service:Partly cancelled | Boolean | YES | | Service:Maximum delay | Int64 | YES | | Stop:RDT-ID | Int64 | YES | | Stop:Station code | Utf8View | YES | | Stop:Station name | Utf8View | YES | | Stop:Arrival time | Timestamp(Second, None) | YES |. <--- Note this field type is different | Stop:Arrival delay | Int64 | YES | | Stop:Arrival cancelled | Boolean | YES | | Stop:Departure time | Utf8View | YES | | Stop:Departure delay | Utf8View | YES | | Stop:Departure cancelled | Utf8View | YES | +------------------------------+-------------------------+-------------+ 17 row(s) fetched. Elapsed 0.008 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
