ion-elgreco opened a new issue, #8372:
URL: https://github.com/apache/arrow-datafusion/issues/8372

   ### Describe the bug
   
   I am reading a parquet table, where one parquet file contains large arrow 
types, and one parquet file with normal arrow types. Datafusion doesn't like 
this and spits out an error that it cannot map the file schema to the table 
schema. 
   
   This even happens when I provide ParquetReadOptions with a schema that 
contains the normal arrow types. PyArrow for example has no issue reading these 
mixed large/normal types parquet table.
   
   ```rust
   Plan("Cannot cast file schema field struct of type Struct([Field { name: 
\"x\", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, 
metadata: {} }, Field { name: \"y\", data_type: LargeUtf8, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }]) to table schema field of 
type Struct([Field { name: \"x\", data_type: Int64, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: \"y\", data_type: Utf8, 
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])")
   ```
   
   ### To Reproduce
   
   ```rust
   stream = context
               .ctx
               .read_parquet(locations, 
ParquetReadOptions::default().schema(&file_schema))
               .await?;
               .execute_stream()
               .await?
               .map_err(|err| {
                   ParquetError::General(format!("Z-order failed while scanning 
data: {:?}", err))
               })
               .boxed();
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to