alamb opened a new pull request, #13101:
URL: https://github.com/apache/datafusion/pull/13101

   Replacement for https://github.com/apache/datafusion/pull/12092 which had 
too much history on it
   
   Draft as it builds on:
   - [ ] https://github.com/apache/datafusion/pull/12816 @goldmedal 
   
   ## Which issue does this PR close?
   
   Closes https://github.com/apache/datafusion/issues/11682
   
   
   ## Rationale for this change
   
   Reading data as `StringViewArray` is significantly faster than 
`StringArray`. We have been testing this behind a feature flag but it is now 
stable enough to enable by default. 
   
   See blog post  https://github.com/apache/datafusion/issues/11603: 
   * 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/
   * 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/
   
   Benchmark Results
   (RUNNING)
   
   ```
   
   ## What changes are included in this PR?
   
   1. Set `schema_force_view_types`  to true
   
   ## Are these changes tested?
   Yes, by CI tests
   
   
   ## Are there any user-facing changes?
   1. Faster reading of data from Parquet files
   
   If you see an error related to StringView use, you can disable this feature 
using the schema_force_string_view option
   
   ```sql
   > set datafusion.execution.parquet.schema_force_view_types = false;
   0 row(s) fetched.
   Elapsed 0.000 seconds.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to