alamb opened a new pull request, #13101: URL: https://github.com/apache/datafusion/pull/13101
Replacement for https://github.com/apache/datafusion/pull/12092 which had too much history on it Draft as it builds on: - [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11682 ## Rationale for this change Reading data as `StringViewArray` is significantly faster than `StringArray`. We have been testing this behind a feature flag but it is now stable enough to enable by default. See blog post https://github.com/apache/datafusion/issues/11603: * https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/ * https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/ Benchmark Results (RUNNING) ``` ## What changes are included in this PR? 1. Set `schema_force_view_types` to true ## Are these changes tested? Yes, by CI tests ## Are there any user-facing changes? 1. Faster reading of data from Parquet files If you see an error related to StringView use, you can disable this feature using the schema_force_string_view option ```sql > set datafusion.execution.parquet.schema_force_view_types = false; 0 row(s) fetched. Elapsed 0.000 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org