zhuqi-lucas commented on issue #7350:
URL: https://github.com/apache/arrow-rs/issues/7350#issuecomment-2767941953

   Thank you @XiangpengHao @alamb , i was thinking to support longer inline 
prefix for StringView to compare, but it looks like it's always fixed to 4 
bytes, we can't change it easily.
   
   
   
   
   > > we add new new ByteView to support 8bytes prefix
   > 
   > I think Arrow spec says we need to do 4 bytes prefix: 
https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-view-layout
   > 
   > As you have pointed out, StringViewArray is not always better than 
StringArray, especially when the prefixes are the same.
   > 
   > But I do believe there are micro-architecture level optimizations we can 
do to improve performance, like better compiler hint, prefetching, gc tuning 
etc.
   > 
   > Another direction is probably to rewrite the FilterExec/CoalesenceExec to 
emit StringArray rather than StringViewArray, the idea is to use StringView in 
lower levels of the plan and use String in higher levels of the plan
   
   
   I agree, the linked PR using GC to as a workaround for sort merge compare 
cases.
   
   
   
   
   
   > I do think theoretically StringArray is likely to be faster than 
StringViewArray for larger strings in many cases as it is more efficient (it 
has fewer indirections)
   > 
   > > Another direction is probably to rewrite the FilterExec/CoalesenceExec 
to emit StringArray rather than StringViewArray, the idea is to use StringView 
in lower levels of the plan and use String in higher levels of the plan
   > 
   > that is a very interesting idea 🤔
   
   
   For FilterExec/CoalesenceExec, interesting, this is using GC to reduce the 
overhead of FilterExec/CoalesenceExec. May be we can try  rewrite the 
FilterExec/CoalesenceExec to emit StringArray and to compare the gain and loss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to