alamb opened a new issue, #5374:
URL: https://github.com/apache/arrow-rs/issues/5374

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Recently two new types were added to the Arrow format that make it more 
suitable for certain types of operations on strings
   
   Specifically when doing filtering / take with string data, creating a new 
`Utf8Array` requires copying the strings to a new, packed binary buffer. The 
"VariableSizeBinaryView` was designed to solve this limitation and recently 
added to the Arrow spec.
   
   **Describe the solution you'd like**
   I would like to implement `StringViewArray` and `BinaryViewArray` following 
the spec:
   The spec:  
https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-view-layout
   
https://github.com/apache/arrow/blob/3fe598ae4dfd7805ab05452dd5ed4b0d6c97d8d5/format/Schema.fbs#L187-L205
   
   Initially, I would suggest we get the basic types in place:
   - [ ] `DataType`
   - [ ] `Array` implemntations and layout and basic construction
   
   Then as follow on PRs, add support to key kernels:
   - [ ] `cast` (to/from StringArray / DictionaryArray)
   - [ ] `filter`
   - [ ] `take`
   
   
   **Describe alternatives you've considered**
   There are some commits with early prototype from @tustvold  linked from 
https://github.com/apache/arrow-rs/issues/4253. Maybe we can pull that code 
somewhere into a PR. 
   
   
   **Additional context**
   Polars implemented it recently in rust so that can serve as a motivation
   Blog Post https://pola.rs/posts/polars-string-type/
   https://twitter.com/RitchieVink/status/1749466861069115790 
   
   Related PRs:
   https://github.com/pola-rs/polars/pull/13748 
   https://github.com/pola-rs/polars/pull/13839
   https://github.com/pola-rs/polars/pull/13489
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to