On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão <jorgecarlei...@gmail.com>
wrote:

> ...
>
> Wrt to binary, imo the challenge is:
> * we state that backward incompatible changes to the c data interface
> require a new spec [1]
>

Note that this discussion wouldn't change anything about the C Data
Interface spec itself. The discussion is only about the *value* that is put
in one of the key-value metadata fields. The C Data Interface spec defines
how the metadata needs to be stored, but doesn't specify anything about the
actual value of one of the key-value metadata fields.


> * we state that the metadata is a binary string [2]
> * a valid string is a subset of all valid byte arrays and thus removing "
> *string*" from the spec is backward incompatible
>
> If we write invalid utf8 to it and a reader assumes utf8 when reading it,
> we trigger undefined behavior.
>
> I was a bit surprised by ARROW-15613 - my understanding is that the c++
> implementation is not following the spec, and if we at arrow2 were not be
> checking for utf8, we would be exposing a vulnerability (at least according
> to Rust's standards). We just checked it out of luck (it is O(1), so why
> not).
>

Yes, the C++ implementation is indeed not following the spec. See the
"[DISCUSS] Binary Values in Key value pairs" thread (
https://lists.apache.org/thread/blmj0cgv34dgdxqd3ow60ln68khnz0qr). Let's
maybe keep this part of the discussion there?

Reply via email to