wiedld opened a new pull request, #12232: URL: https://github.com/apache/datafusion/pull/12232
## Which issue does this PR close? Closes #12123 ## Rationale for this change On write: parquet file written with utf8/large-utf & binary/large-binary schema (so is in metadata). On read: we would like to be able to read as the more performant view types. Previous work has already used the `schema_force_string_view` to read into utf8view and binaryview, by [passing around a bool to the ParquetOpener](https://github.com/apache/datafusion/blob/bd506980bd04c109d9fa979be5b627580e59d267/datafusion/core/src/datasource/physical_plan/parquet/opener.rs#L104-L106). This work is to focus on getting the parquet statistics, on read, to properly compute when reading as view types. ## What changes are included in this PR? * move the `schema_force_string_view` up a few lines, to be with the "read" (not write) config options. * remove the passing around of bools * this was done by merging table_schema (with views) and file_schema (without views) * add tests which run with true|false for `schema_force_string_view` **WIP, still debugging:** there is an earlier draft PR [which has all CI ](https://github.com/apache/datafusion/pull/11862/files#r1727710645) tests passing when the `schema_force_string_view=true`. However, I'm not quite sure how that worked since the [page level statistics](https://github.com/apache/arrow-rs/blob/master/parquet/src/arrow/arrow_reader/statistics.rs#L796) are not implemented for view types. I made a test to capture this gap. Additionally, we still have statistics failing -- but only for one test case (not others using view types). I'll chase down tmrw what's different in that test. ## Are these changes tested? Yes. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
