alamb commented on PR #6062:
URL: https://github.com/apache/arrow-rs/pull/6062#issuecomment-2228883884

   > Here is a quick benchmark, and the result looks reasonable.
   
   I agree that this result looks reasonable given this PR doesn't have any 
StringView specific optimizations. It is unfortunate, but not unexpected, that 
creating a `StringView` will be slower than `StringArray` if the strings are 
copied
   
   > Some thoughts on reusing the buffer: CSV is row format, making it 
difficult to reuse the underlying buffer because we will likely hold the entire 
file in memory. So I think it makes sense to copy the strings to new place.
   
   For some usecases  (like streaming read + filter) I think might make sense 
to reuse the buffers (the rationale being that the extra memory usage would be 
for a short period of time, and many of the rows are likely to be filered out). 
 And users could always call 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#method.gc
 to copy / compact the strings if desired (or simply read as StringView)
   
   For this PR I think starting simple is good and we can file a ticket to 
optimize the implementation later
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to