Dandandan commented on code in PR #9236:
URL: https://github.com/apache/arrow-rs/pull/9236#discussion_r2713707489
##########
parquet/src/arrow/array_reader/byte_view_array.rs:
##########
@@ -373,32 +383,33 @@ impl ByteViewArrayDecoderPlain {
// The implementation keeps a water mark
`utf8_validation_begin` to track the beginning of the buffer that is not
validated.
// If the length is smaller than 128, then we continue to next
string.
// If the length is larger than 128, then we validate the
buffer before the length bytes, and move the water mark to the beginning of
next string.
- if len < 128 {
- // fast path, move to next string.
- // the len bytes are valid utf8.
- } else {
+ if len >= 128 {
// unfortunately, the len bytes may not be valid utf8, we
need to wrap up and validate everything before it.
- check_valid_utf8(unsafe {
- buf.get_unchecked(utf8_validation_begin..self.offset)
- })?;
+ check_valid_utf8(unsafe {
buf.get_unchecked(utf8_validation_begin..offset) })?;
// move the cursor to skip the len bytes.
utf8_validation_begin = start_offset;
}
}
+ let view = make_view(
Review Comment:
I think the "builder-style" API's is something to be avoided (as it will
result in push/element-based handling), and we could just as well use the view
`Vec` directly as the `ViewBuffer` doesn't give us much more.
My feeling is that we probably want to use `extend` here as well (if/when
compiler give us good code) or perhaps write to the unitialized memory here
with some `unsafe` if it gives enough of a speedup.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]