alamb opened a new pull request, #11723: URL: https://github.com/apache/datafusion/pull/11723
Draft until: - [ ] get tests passing - [ ] Run benchmarks and fix any regressions - [ ] Close https://github.com/apache/datafusion/issues/10918 and file remaining follow on work for StringView ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11682 ## Rationale for this change @XiangpengHao @a10y @PsiACE @Weijun-H and others have been working this summer of using `StringView`: https://github.com/apache/datafusion/issues/10918 to improve performance,. It is currently disabled behind a config setting. Let's turn it on and improve query performance against parquet files ## What changes are included in this PR? 1. set `datafusion.execution.parquet.schema_force_string_view` to true by default 2. Update tests ## Are these changes tested? Yes by CI Benchmark results: TBR ## Are there any user-facing changes? Yes. With this change, it means that all strings will be read from parquet files as `StringView` by default. This should result in a significant performance improvement for queries that involve string columns, especially for highly selective ones It can still be disabled by default if needed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org