Jefffrey commented on code in PR #9275:
URL: https://github.com/apache/arrow-rs/pull/9275#discussion_r2740305226


##########
arrow-row/src/list.rs:
##########
@@ -120,17 +120,35 @@ pub unsafe fn decode<O: OffsetSizeTrait>(
     let mut offsets = Vec::with_capacity(rows.len() + 1);
     offsets.push(O::usize_as(0));
 
+    // Check if child is Null type - if so, we need special handling
+    // to count elements correctly even when they decode to 0 bytes
+    let is_null_child = matches!(&field.data_type, DataType::List(f) if 
f.data_type() == &DataType::Null)

Review Comment:
   > 1. Breaking change to the row format: We'd need to encode NullArray 
elements differently (e.g., using NON_EMPTY_SENTINEL actual data blocks, or 
changing the list terminator)
   
   We shouldn't be too worried about breaking changes, because the row format 
explicitly describes itself as potentially changing from release to release:
   
   
https://github.com/apache/arrow-rs/blob/caf1c6022c71af00ef712e9e865acfee74169f0d/arrow-row/src/lib.rs#L186-L189
   
   > 2. Backward compatibility concerns: The row format is a stable 
serialization format. Changing encoding would break compatlity with existing 
serialized data
   
   Same here
   
   > 3. Complexity: Any encoding change would need to handle both 
ascending/descending orders, nested lists, and ensure comparn semantics remain 
correct
   
   I believe the main encoding fix is ensuring `NULL` datatype is properly 
encoded as a value that is null, and not as an empty value; see my comments on 
the original issue. I feel this fix adds more complexity since it handles a bug 
in the encoding logic, but at the decoding stage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to