alamb opened a new issue, #6058: URL: https://github.com/apache/arrow-rs/issues/6058
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Part of https://github.com/apache/arrow-rs/issues/5374 @XiangpengHao implemented optimized row format --> ByteView (StringView / BinaryView) encoding/decoding in https://github.com/apache/arrow-rs/issues/5945 / https://github.com/apache/arrow-rs/pull/6044 It also adds benchmarks so we can test🎉 However, as mentioned in https://github.com/apache/arrow-rs/pull/6044/files#r1676804033 if we know that the `Row` value was created from valid utf8 values, re-validating utf8 is unnecessary. **Describe the solution you'd like** Consider an API that would allow skipping utf8 validation This would need to be justified by performance benchmarks showing it made a significant difference in performance **Describe alternatives you've considered** Perhaps it would be an `unsafe` option on the [RowConverter](https://docs.rs/arrow-row/52.1.0/arrow_row/struct.RowConverter.html) ```rust let converter = RowConverter::new(...); // Safety: only decoding Rows that came from valid String arrays let converter = unsafe { converter.with_validate_utf8(false) } ``` **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
