Hello all, In the PR to add support for Utf8View to the c++ implementation, I've taken the approach of allowing raw pointer views [1] alongside the index/offset views described in the spec [2]. This was done to ease communication with other engines such as DuckDB and Velox whose native string representation is the raw pointer view. In order to be usable as a utility for writing IPC files and other operations on arrow formatted data, it is useful for the library to be able to directly import raw pointer arrays even when immediately converting these to the index/offset representation.
However there has been objection in review [3] since the raw pointer representation is not part of the official format. Since data visitation utilities are generic, IMHO this hybrid approach does not add significantly to the complexity of the C++ library, and I feel the aforementioned interoperability is a high priority when adding this feature to the C++ library. It's worth noting that this interoperability has been a stated goal of the Utf8Type since its original proposal [4] and throughout the discussion of its adoption [5]. Sincerely, Ben Kietzman [1]: https://github.com/apache/arrow/pull/37792/files#diff-814ac6f43345f7d2f33e9249a1abf092c8078c62ec44cd782c49b676b94ec302R731-R752 [2]: https://github.com/apache/arrow/blob/9d6d501/docs/source/format/Columnar.rst#L369-L379 [3]: https://github.com/apache/arrow/pull/37792#discussion_r1336010665 [4]: https://lists.apache.org/thread/49qzofswg1r5z7zh39pjvd1m2ggz2kdq [5]: https://lists.apache.org/thread/8mofy7khfvy3g1m9pmjshbty3cmvb4w4