HippoBaro commented on code in PR #9848:
URL: https://github.com/apache/arrow-rs/pull/9848#discussion_r3494363903


##########
parquet/src/arrow/array_reader/list_array.rs:
##########
@@ -97,121 +108,81 @@ impl<OffsetSize: OffsetSizeTrait> ArrayReader for 
ListArrayReader<OffsetSize> {
             .get_rep_levels()
             .ok_or_else(|| general_err!("item_reader rep levels are None."))?;
 
-        if OffsetSize::from_usize(next_batch_array.len()).is_none() {
-            return Err(general_err!(
-                "offset of {} would overflow list array",
-                next_batch_array.len()
-            ));
+        if def_levels.is_empty() {
+            return Ok(new_empty_array(&self.data_type));
         }
 
-        if !rep_levels.is_empty() && rep_levels[0] != 0 {
-            // This implies either the source data was invalid, or the leaf 
column
-            // reader did not correctly delimit semantic records
+        if rep_levels[0] != 0 {
             return Err(general_err!("first repetition level of batch must be 
0"));
         }
 
-        // A non-nullable list has a single definition level indicating if the 
list is empty

Review Comment:
   Sorry that didn't survive the refactor. I added it back along with the table:
   
   ```rust
           // Definition levels identify whether each list boundary contributes 
a
           // child slot. For a nullable list, the states are:
           //
           //   d >= self.def_level     : list is present with a child item
           //   d == self.def_level - 1 : list is present but empty
           //   d <= self.def_level - 2 : list is null
           //
           // Required lists do not have the null state, but still use the same
           // `d >= self.def_level` test to distinguish child items from empty
           // lists. Repetition levels identify list boundaries and whether a
           // child item belongs directly to this list or to a nested child 
list.
           let levels_len = def_levels.len();
   ```
   
   Ref: 
https://github.com/HippoBaro/arrow-rs/blob/6598016d3ce76145594d913c4de468e28b9587a6/parquet/src/arrow/array_reader/list_array.rs#L126-L137



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to