QuakeWang commented on code in PR #52:
URL: https://github.com/apache/paimon-mosaic/pull/52#discussion_r3362568554
##########
core/src/bucket_writer.rs:
##########
@@ -926,6 +1057,24 @@ fn i128_to_biginteger_bytes(val: i128) -> Vec<u8> {
bytes[start..].to_vec()
}
+fn extract_list_lengths(list_array: &ListArray) -> Int32Array {
+ let offsets = list_array.value_offsets();
+ let num_rows = list_array.len();
+ let mut lengths = Vec::with_capacity(num_rows);
+ for i in 0..num_rows {
+ lengths.push(offsets[i + 1] - offsets[i]);
Review Comment:
For null list rows, this still records the physical offset delta as the
length. Arrow permits null list slots to have a non-empty offset range, but
semantically those child values should not be consumed by the logical ARRAY
column. Combined with `flatten_list_values()` slicing the full offsets range, a
later non-null row can read values that belonged to a null row. Please write
length 0 for null list rows and flatten only non-null list ranges; add a
regression test with a null list row whose offsets cover child values.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]