HawaiianSpork commented on issue #5950: URL: https://github.com/apache/datafusion/issues/5950#issuecomment-2724596144
> > This should be fixed now by https://github.com/apache/datafusion/pull/10515. You can now override the schema used in the file scanner using the SchemaAdapter. > > Doesn't the SchemaAdapter _convert_ the schema that was already read? So it doesn't really solve the issue. > > Does passing in a schema to [FileScanConfig](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/struct.FileScanConfig.html#structfield.file_schema) not work? Or is this request specifically for a Python API? The file_schema in FileScanConfig can be used to coarse the schema read from parquet into the supplied schema using arrow cast. If, however, you need functionality beyond cast (for example to add columns that don't exist in some of the parquet files) than schemaAdapter can be used to convert the data returned before it is used by datafusion. This allows the extension of the parquet table provider. Otherwise, a new table provider would need to be created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org