alamb opened a new pull request, #12092:
URL: https://github.com/apache/datafusion/pull/12092

   
   Draft as it builds on:
   - [ ] https://github.com/apache/datafusion/pull/12032/files
   - [ ] TBF
   
   Things left to do:
   - [ ] File ticket for protobuf / substrait support
   
   
   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/11682
   
   ## Rationale for this change
   
   Reading data as `StringViewArray` is significantly faster than 
`StringArray`. We have been testing this behind a feature flag but it is now 
stable enough to enable by default. 
   
   Benchmark Results (RUNNING)
   
   ## What changes are included in this PR?
   
   1. Set `schema_force_string_view`  to true
   
   ## Are these changes tested?
   Yes, by CI tests
   
   
   ## Are there any user-facing changes?
   1. Faster reading of data from Parquet files
   
   If you see an error related to StringView use, you can disable this feature 
using the schema_force_string_view option
   
   ```sql
   > set datafusion.execution.parquet.schema_force_string_view = false;
   0 row(s) fetched.
   Elapsed 0.000 seconds.
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to