alamb opened a new pull request, #11723:
URL: https://github.com/apache/datafusion/pull/11723

   Draft until:
   - [ ] get tests passing
   - [ ] Run benchmarks and fix any regressions
   - [ ] Close https://github.com/apache/datafusion/issues/10918 and file 
remaining follow on work for StringView
   
   ## Which issue does this PR close?
   Closes https://github.com/apache/datafusion/issues/11682
   
   
   ## Rationale for this change
   
   @XiangpengHao @a10y  @PsiACE @Weijun-H  and others have been working this 
summer of using `StringView`: https://github.com/apache/datafusion/issues/10918 
to improve performance,. 
   
   It is currently disabled behind a config setting. Let's turn it on and 
improve query performance against parquet files
   
   ## What changes are included in this PR?
   1. set `datafusion.execution.parquet.schema_force_string_view` to true by 
default
   2. Update tests
   
   ## Are these changes tested?
   
   Yes by CI
   
   Benchmark results: TBR
   
   ## Are there any user-facing changes?
   Yes. With this change, it means that all strings will be read from parquet 
files as `StringView` by default.
   This should result in a significant performance improvement for queries that 
involve string columns, especially for highly selective ones
   
   It can still be disabled by default if needed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to