mbutrovich opened a new issue, #1707: URL: https://github.com/apache/datafusion-comet/issues/1707
### What is the problem the feature request solves? When using `native_datafusion` or `native_iceberg_compat` Parquet readers based on DataFusion's DataSourceExec, the schemas that Comet passes in result in dictionaries being unpacked immediately. ### Describe the potential solution Arrow-rs will use a provided schema as a hint, and in the case of dictionary encoded columns, preserve the encoding: https://github.com/apache/arrow-rs/blob/880be2f0a0b9675d8b42206e70543472a58792aa/parquet/src/arrow/schema/primitive.rs#L91 The challenge is similar to int96, where the native side doesn't really have the Parquet schema when generating the DataSourceExec. We'd either need to pass this from early on the Spark side when the schema is first read, or add a coercion rule to DataFusion. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org