Re: [PR] Improve parquet BinaryView / StringView decoder performance (up to -20%) [arrow-rs]

via GitHub Wed, 21 Jan 2026 09:55:44 -0800


Dandandan commented on code in PR #9236:
URL: https://github.com/apache/arrow-rs/pull/9236#discussion_r2713707489



##########
parquet/src/arrow/array_reader/byte_view_array.rs:
##########
@@ -373,32 +383,33 @@ impl ByteViewArrayDecoderPlain {
                 // The implementation keeps a water mark 
`utf8_validation_begin` to track the beginning of the buffer that is not 
validated.
                 // If the length is smaller than 128, then we continue to next 
string.
                 // If the length is larger than 128, then we validate the 
buffer before the length bytes, and move the water mark to the beginning of 
next string.
-                if len < 128 {
-                    // fast path, move to next string.
-                    // the len bytes are valid utf8.
-                } else {
+                if len >= 128 {
                     // unfortunately, the len bytes may not be valid utf8, we 
need to wrap up and validate everything before it.
-                    check_valid_utf8(unsafe {
-                        buf.get_unchecked(utf8_validation_begin..self.offset)
-                    })?;
+                    check_valid_utf8(unsafe { 
buf.get_unchecked(utf8_validation_begin..offset) })?;
                     // move the cursor to skip the len bytes.
                     utf8_validation_begin = start_offset;
                 }
             }
 
+            let view = make_view(

Review Comment:
   I think the "builder-style" API's is something to be avoided (as it will 
result in push/element-based handling), and we could just as well use the view 
`Vec` directly as the `ViewBuffer` doesn't give us much more.
   
   My feeling is that we probably want to use `extend` here as well (if/when 
compiler give us good code) or perhaps write to the unitialized memory here 
with some `unsafe` if it gives enough of a speedup.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Improve parquet BinaryView / StringView decoder performance (up to -20%) [arrow-rs]

Reply via email to