For better or for worse the Rust implementation requires the underlying buffer is UTF-8 including null slots, as this allows returning the buffer as a native string type, which in turn allows kernels to use Rust's native string functionality. Whilst I agree the specification is ambiguous on this note, this interpretation doesn't appear to have caused issues so far.
On 2 July 2023 13:07:20 BST, Antoine Pitrou <anto...@python.org> wrote: > > >Le 02/07/2023 à 14:00, Raphael Taylor-Davies a écrit : >> >> More an observation than an issue, but UTF-8 validation for StringArray can >> be done very efficiently by first verifying the entire buffer, and then >> verifying the offsets correspond to the start of a UTF-8 codepoint. > >Caveat: null slots could potentially contain invalid UTF-8 data. Not likely of >course, but it should probably not be an error. > >That said, yes, it is a smart strategy for the common case! > >Regards > >Antoine.