jorgecarleitao commented on a change in pull request #9388: URL: https://github.com/apache/arrow/pull/9388#discussion_r579655167
########## File path: rust/arrow/src/array/array_list.rs ########## @@ -116,14 +116,68 @@ impl<OffsetSize: OffsetSizeTrait> GenericListArray<OffsetSize> { pub fn iter<'a>(&'a self) -> GenericListArrayIter<'a, OffsetSize> { GenericListArrayIter::<'a, OffsetSize>::new(&self) } -} - -impl<'a, S: OffsetSizeTrait> IntoIterator for &'a GenericListArray<S> { - type Item = Option<ArrayRef>; - type IntoIter = GenericListArrayIter<'a, S>; - fn into_iter(self) -> Self::IntoIter { - GenericListArrayIter::<'a, S>::new(self) + /// Creates a [`GenericListArray`] from an iterator of primitive values + /// # Example + /// ``` + /// # use arrow::array::ListArray; + /// # use arrow::datatypes::Int32Type; + /// let data = vec![ + /// Some(vec![Some(0), Some(1), Some(2)]), + /// None, + /// Some(vec![Some(3), None, Some(5)]), + /// Some(vec![Some(6), Some(7)]), + /// ]; + /// let list_array = ListArray::from_iter_primitive::<Int32Type, _, _>(data); + /// println!("{:?}", list_array); + /// ``` + pub fn from_iter_primitive<T, P, I>(iter: I) -> Self + where + T: ArrowPrimitiveType, + P: AsRef<[Option<<T as ArrowPrimitiveType>::Native>]> + + IntoIterator<Item = Option<<T as ArrowPrimitiveType>::Native>>, + I: IntoIterator<Item = Option<P>>, + { + let iterator = iter.into_iter(); + let (lower, _) = iterator.size_hint(); + + let mut offsets = + MutableBuffer::new((lower + 1) * std::mem::size_of::<OffsetSize>()); + let mut length_so_far = OffsetSize::zero(); + offsets.push(length_so_far); Review comment: Variable-length arrays are stored as follows: a `values` buffer and an `offset` buffer. Consider the `values`: ``` [1, 3, 5, 7, 9, 11] ``` When the offset buffer is `[0, 6]`, the array has 1 item: ``` [ [1, 3, 5, 7, 9, 11] ] ``` when the offset buffer is `[0, 5, 6]`, the array has 2 items, ``` [ [1, 3, 5, 7, 9] # i.e [0,5] [11] # i.e. [5,6] ] ``` In other words, the offset buffer "splits" the values in contiguous chunks. The following are invariants: 1. `offset.len() > 0` 2. `array.len() == offsets.len() - 1` 3. `offsets[i+1] >= offsets[i]` for all `i` in bounds 4. `offsets[array.len()] == values.len()` If any of these do not hold, the code either panics (case 1 and 3), or results in out of bound accesses (case 2 and 4). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org