QuakeWang commented on code in PR #52:
URL: https://github.com/apache/paimon-mosaic/pull/52#discussion_r3362568554


##########
core/src/bucket_writer.rs:
##########
@@ -926,6 +1057,24 @@ fn i128_to_biginteger_bytes(val: i128) -> Vec<u8> {
     bytes[start..].to_vec()
 }
 
+fn extract_list_lengths(list_array: &ListArray) -> Int32Array {
+    let offsets = list_array.value_offsets();
+    let num_rows = list_array.len();
+    let mut lengths = Vec::with_capacity(num_rows);
+    for i in 0..num_rows {
+        lengths.push(offsets[i + 1] - offsets[i]);

Review Comment:
   For null list rows, this still records the physical offset delta as the 
length. Arrow permits null list slots to have a non-empty offset range, but 
semantically those child values should not be consumed by the logical ARRAY 
column. Combined with `flatten_list_values()` slicing the full offsets range, a 
later non-null row can read values that belonged to a null row. Please write 
length 0 for null list rows and flatten only non-null list ranges; add a 
regression test with a null list row whose offsets cover child values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to