alamb commented on issue #7423: URL: https://github.com/apache/arrow-rs/issues/7423#issuecomment-2894597621
@scovich had some great comments on https://github.com/apache/arrow-rs/pull/7452#discussion_r2095809708 that I wanted to copy/paste into this ticket perhaps for wider discussion: > I have not reviewed the code carefully at all yet, and what follows is a general observation based on the inherent nature of variant data and rust notions of safety: > > It will be really tempting to have "efficient" code that e.g. uses [from_utf8_unchecked](https://doc.rust-lang.org/std/primitive.str.html#method.from_utf8_unchecked) to extract a `&str` from a `&[u8]`, or to use indexing operations like `v[10]` to extract bytes. But variant data is generally untrusted user input and whatever `Variant` struct/enum we define will become the first -- and often only -- line of defense against malicious or malformed input. > > Hopefully we can code carefully, with the goal that sizes and/or contents of metadata and value slices will never cause a panic? > > Additinoally, it seems like we have a few choices for values such as strings and decimals even a right-sized byte slice can contain invalid values: > > 1. Return obviously unvalidated values, e.g. `&[u8]` instead of `&str` for strings, and `&[u8]` instead of whatever `VariantDecimal` struct we might otherwise define -- leaving the user responsible to finish the conversion as (un)safely as they deem prudent. > 2. Return ostensibly validated values, with (safe) checked and (unsafe) unchecked constructors and/or getters that let the user choose the one they deem appropriate. > > I personally favor the latter approach (safe and easy to use, even if not always the absolutely max efficient), but the topic probably needs a wider discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
