HawaiianSpork opened a new issue, #6803:
URL: https://github.com/apache/arrow-rs/issues/6803

   **Describe the bug**
   If the arrow flight encodes a record batch slice where the first row of that 
slice has offset zero then it reuses the non-sliced data for offsets.  This can 
not only cause offset arrays that are larger than the slice to be encoded but 
the offset encoding will be incorrect as there may be zero or more rows that 
were not removed from the slice.
   
   **To Reproduce**
   ```rust
   
       fn generate_nested_list_data_starting_at_zero<O: OffsetSizeTrait>() -> 
GenericListArray<O> {
           let mut ls =
               GenericListBuilder::<O, _>::new(GenericListBuilder::<O, 
_>::new(UInt32Builder::new()));
   
   
           for _i in 0..999 {
               ls.values().append(true);
               ls.append(true);
           }
   
           for j in 0..10 {
               for value in [j, j, j, j] {
                   ls.values().values().append_value(value);
               }
               ls.values().append(true)
           }
           ls.append(true);
   
   
           for i in 0..9_000 {
               for j in 0..10 {
                   for value in [i+j, i+j, i+j, i+j] {
                       ls.values().values().append_value(value);
                   }
                   ls.values().append(true)
               }
               ls.append(true);
           }
   
           ls.finish()
       }
   
       #[test]
       fn encode_nested_lists_starting_at_zero() {
           let inner_int = Arc::new(Field::new("item", DataType::UInt32, true));
           let inner_list_field = Arc::new(Field::new("item", 
DataType::List(inner_int), true));
           let list_field = Field::new("val", DataType::List(inner_list_field), 
true);
           let schema = Arc::new(Schema::new(vec![list_field]));
   
           let values = 
Arc::new(generate_nested_list_data_starting_at_zero::<i32>());
   
           let in_batch = RecordBatch::try_new(schema, vec![values]).unwrap();
           roundtrip_ensure_sliced_smaller(in_batch, 1);
       }
   ```
   
   will result in an error where all lists are empty.
   
   **Expected behavior**
   No error is thrown at list offsets are properly encoded.
   
   **Additional context**
   This line seems to be the problem: 
https://github.com/apache/arrow-rs/blob/5a86db3c47d692584f694701780b0d8944a5e984/arrow-ipc/src/writer.rs#L1433
   Setting that line to instead `0 => offset_slice.iter().map(|x| 
*x).collect(),` fixes the problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to