XiangpengHao commented on code in PR #6044:
URL: https://github.com/apache/arrow-rs/pull/6044#discussion_r1674250364


##########
arrow-row/src/variable.rs:
##########
@@ -243,6 +244,88 @@ pub fn decode_binary<I: OffsetSizeTrait>(
     unsafe { GenericBinaryArray::from(builder.build_unchecked()) }
 }
 
+fn decode_binary_view_inner(
+    rows: &mut [&[u8]],
+    options: SortOptions,
+    check_utf8: bool,
+) -> BinaryViewArray {
+    let len = rows.len();
+
+    let mut null_count = 0;
+
+    let nulls = MutableBuffer::collect_bool(len, |x| {
+        let valid = rows[x][0] != null_sentinel(options);
+        null_count += !valid as usize;
+        valid
+    });
+
+    let values_capacity: usize = rows.iter().map(|row| decoded_len(row, 
options)).sum();
+    let mut values = MutableBuffer::new(values_capacity);

Review Comment:
   We reserved the max amount of bytes for this buffer. But techinically, the 
buffer can only contain strings that are longer than 12 bytes.
   
   This means that we are slightly less memory efficient. But the advantage is 
that we can easily do utf8 validation later. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to