alamb opened a new pull request, #12092: URL: https://github.com/apache/datafusion/pull/12092
Draft as it builds on: - [ ] https://github.com/apache/datafusion/pull/12032/files - [ ] TBF Things left to do: - [ ] File ticket for protobuf / substrait support ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/11682 ## Rationale for this change Reading data as `StringViewArray` is significantly faster than `StringArray`. We have been testing this behind a feature flag but it is now stable enough to enable by default. Benchmark Results (RUNNING) ## What changes are included in this PR? 1. Set `schema_force_string_view` to true ## Are these changes tested? Yes, by CI tests ## Are there any user-facing changes? 1. Faster reading of data from Parquet files If you see an error related to StringView use, you can disable this feature using the schema_force_string_view option ```sql > set datafusion.execution.parquet.schema_force_string_view = false; 0 row(s) fetched. Elapsed 0.000 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org