alamb opened a new issue, #9157: URL: https://github.com/apache/arrow-rs/issues/9157
**Describe the bug** @jhorstmann points out https://github.com/apache/arrow-rs/pull/9058/changes#r2683034659 > Unrelated to the performance improvement: I think this also needs to assert that data_type equals T::DATA_TYPE, otherwise it allows unchecked casting from binary to string without utf8 validation. Basically, by (mis) using safe APIs it is possible to convert a binary view array to Utf8View and bypass the Utf8 check. **To Reproduce** ```rust #[test] #[should_panic(expected = "Invalid UTF-8")] fn invalid_casting_from_array_data() { let array = GenericByteViewArray::<BinaryViewType>::from(vec![ b"aaaaaaaaaaaaaaaaaaaaaaaaaaa" as &[u8], &[ 0xf0, 0x80, 0x80, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, ], b"good", ]); assert!(String::from_utf8(array.value(0).to_vec()).is_ok()); // value 1 is invalid utf8 assert!(String::from_utf8(array.value(1).to_vec()).is_err()); assert!(String::from_utf8(array.value(0).to_vec()).is_ok()); // Should not be able to cast to StringViewArray due to invalid UTF-8 let array_data: arrow_data::ArrayData = array.into(); let _ = StringViewArray::from(array_data); } ``` **Expected behavior** The conversion should panic given the incorrect data type **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
