alamb opened a new issue, #5374: URL: https://github.com/apache/arrow-rs/issues/5374
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Recently two new types were added to the Arrow format that make it more suitable for certain types of operations on strings Specifically when doing filtering / take with string data, creating a new `Utf8Array` requires copying the strings to a new, packed binary buffer. The "VariableSizeBinaryView` was designed to solve this limitation and recently added to the Arrow spec. **Describe the solution you'd like** I would like to implement `StringViewArray` and `BinaryViewArray` following the spec: The spec: https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-view-layout https://github.com/apache/arrow/blob/3fe598ae4dfd7805ab05452dd5ed4b0d6c97d8d5/format/Schema.fbs#L187-L205 Initially, I would suggest we get the basic types in place: - [ ] `DataType` - [ ] `Array` implemntations and layout and basic construction Then as follow on PRs, add support to key kernels: - [ ] `cast` (to/from StringArray / DictionaryArray) - [ ] `filter` - [ ] `take` **Describe alternatives you've considered** There are some commits with early prototype from @tustvold linked from https://github.com/apache/arrow-rs/issues/4253. Maybe we can pull that code somewhere into a PR. **Additional context** Polars implemented it recently in rust so that can serve as a motivation Blog Post https://pola.rs/posts/polars-string-type/ https://twitter.com/RitchieVink/status/1749466861069115790 Related PRs: https://github.com/pola-rs/polars/pull/13748 https://github.com/pola-rs/polars/pull/13839 https://github.com/pola-rs/polars/pull/13489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
