alamb commented on code in PR #5525:
URL: https://github.com/apache/arrow-rs/pull/5525#discussion_r1531022665


##########
arrow-ipc/src/reader.rs:
##########
@@ -79,6 +83,16 @@ fn create_array(reader: &mut ArrayReader, field: &Field) -> 
Result<ArrayRef, Arr
                 reader.next_buffer()?,
             ],
         ),
+        BinaryView | Utf8View => {
+            let count = variadic_counts
+                .pop_front()
+                .ok_or(ArrowError::IpcError("Incorrect variadic 
count!".to_owned()))?;

Review Comment:
   I think being more explicit would help here. Maybe something like
   
   ```suggestion
                   .ok_or(ArrowError::IpcError(format!("{data_type} require at 
least one variadic count, got none"))?;
   ```



##########
arrow-ipc/src/reader.rs:
##########
@@ -79,6 +83,16 @@ fn create_array(reader: &mut ArrayReader, field: &Field) -> 
Result<ArrayRef, Arr
                 reader.next_buffer()?,
             ],
         ),
+        BinaryView | Utf8View => {
+            let count = variadic_counts
+                .pop_front()
+                .ok_or(ArrowError::IpcError("Incorrect variadic 
count!".to_owned()))?;
+            let count = count + 2; // view and null buffer.
+            let buffers = (0..count)
+                .map(|_| reader.next_buffer())
+                .collect::<Result<Vec<_>, _>>()?;
+            create_primitive_array(reader.next_node(field)?, data_type, 
&buffers)

Review Comment:
   I think we also need to check that we got the expected number of `buffers` 
(as in that `buffers.len()` is `count`)



##########
arrow-ipc/src/reader.rs:
##########
@@ -333,30 +356,39 @@ impl<'a> ArrayReader<'a> {
                     self.skip_buffer()
                 }
             }
+            Utf8View | BinaryView => {
+                let count = variadic_count
+                    .pop_front()
+                    .ok_or(ArrowError::IpcError("Incorrect variadic 
count!".to_owned()))?;

Review Comment:
   Same comments as above in `read_field`



##########
arrow-ipc/src/reader.rs:
##########
@@ -399,6 +431,13 @@ pub fn read_record_batch(
     let field_nodes = batch.nodes().ok_or_else(|| {
         ArrowError::IpcError("Unable to get field nodes from IPC 
RecordBatch".to_string())
     })?;
+
+    let mut variadic_counts: VecDeque<i64> = if let Some(v) = 
batch.variadicBufferCounts() {
+        v.iter().collect()
+    } else {
+        VecDeque::default()
+    };

Review Comment:
   (very minor) If you are into functional programming you can make this code 
more concise like this I think:
   
   ```suggestion
       let mut variadic_counts: VecDeque<i64> = batch.variadicBufferCounts()
           .map(|v| v.iter().collect())
           .unwrap_or_default();```



##########
arrow-ipc/src/writer.rs:
##########
@@ -547,6 +564,57 @@ impl IpcDataGenerator {
     }
 }
 
+fn set_variadic_buffer_counts(counts: &mut Vec<i64>, array: &dyn Array) {

Review Comment:
   Maybe a name like 
   
   ```suggestion
   fn append_variadic_buffer_counts(counts: &mut Vec<i64>, array: &dyn Array) {
   ```
   
   Would be clearer?
   
   I think it would also help to clarify that this code adds variadic buffers 
for nested types as well



##########
arrow-ipc/src/reader.rs:
##########
@@ -67,7 +67,11 @@ fn read_buffer(
 ///     - check if the bit width of non-64-bit numbers is 64, and
 ///     - read the buffer as 64-bit (signed integer or float), and
 ///     - cast the 64-bit array to the appropriate data type
-fn create_array(reader: &mut ArrayReader, field: &Field) -> Result<ArrayRef, 
ArrowError> {
+fn create_array(

Review Comment:
   Can you also please add a doc comment here explaining what happens with 
`variadic_counts` (specifically that the first value is removed for variadic 
arrays like BinaryView and Utf8View)?



##########
arrow-ipc/src/writer.rs:
##########
@@ -547,6 +564,57 @@ impl IpcDataGenerator {
     }
 }
 
+fn set_variadic_buffer_counts(counts: &mut Vec<i64>, array: &dyn Array) {
+    match array.data_type() {
+        DataType::BinaryView | DataType::Utf8View => {
+            // The spec is not clear on whether the view/null buffer should be 
included in the variadic buffer count.
+            // But from C++ impl 
https://github.com/apache/arrow/blob/b448b33808f2dd42866195fa4bb44198e2fc26b9/cpp/src/arrow/ipc/writer.cc#L477
+            // we know they are not included.
+            counts.push(array.to_data().buffers().len() as i64 - 1);
+        }
+        DataType::Struct(_) => {
+            let array = array.as_any().downcast_ref::<StructArray>().unwrap();
+            for column in array.columns() {
+                set_variadic_buffer_counts(counts, column.as_ref());
+            }
+        }
+        DataType::LargeList(_) => {
+            let array = 
array.as_any().downcast_ref::<LargeListArray>().unwrap();
+            set_variadic_buffer_counts(counts, array.values());
+        }
+        DataType::List(_) => {
+            let array = array.as_any().downcast_ref::<ListArray>().unwrap();
+            set_variadic_buffer_counts(counts, array.values());
+        }
+        DataType::FixedSizeList(_, _) => {
+            let array = 
array.as_any().downcast_ref::<FixedSizeListArray>().unwrap();
+            set_variadic_buffer_counts(counts, array.values());
+        }
+        DataType::Dictionary(kt, _) => {
+            macro_rules! set_subarray_counts {
+                ($array:expr, $counts:expr, $type:ty, $variant:ident) => {
+                    if &DataType::$variant == kt.as_ref() {
+                        let array = $array
+                            .as_any()
+                            .downcast_ref::<DictionaryArray<$type>>()
+                            .unwrap();
+                        set_variadic_buffer_counts($counts, array.values());
+                    }
+                };
+            }
+            set_subarray_counts!(array, counts, Int8Type, Int8);
+            set_subarray_counts!(array, counts, Int16Type, Int16);
+            set_subarray_counts!(array, counts, Int32Type, Int32);
+            set_subarray_counts!(array, counts, Int64Type, Int64);
+            set_subarray_counts!(array, counts, UInt8Type, UInt8);
+            set_subarray_counts!(array, counts, UInt16Type, UInt16);
+            set_subarray_counts!(array, counts, UInt32Type, UInt32);
+            set_subarray_counts!(array, counts, UInt64Type, UInt64);
+        }
+        _ => {}

Review Comment:
   Can you please change this to explicitly list all other types rather than a 
catch all `_`? That way if/when we add new data types, the compiler will flag 
the potential places to change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to