Morgan Cassels created ARROW-13120:
--------------------------------------

             Summary: [Rust][Parquet] Cannot read multiple batches from parquet 
with string list column
                 Key: ARROW-13120
                 URL: https://issues.apache.org/jira/browse/ARROW-13120
             Project: Apache Arrow
          Issue Type: Bug
            Reporter: Morgan Cassels
         Attachments: test.parquet

This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

```
#[test]
fnfailing_test() {
letparquet_file_reader =
get_test_reader("test.parquet");
letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader);

letmutrecord_batches = Vec::new();

letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap();

forbatchinrecord_batch_reader {
record_batches.push(batch);
}
}
```
```
---- arrow::arrow_reader::tests::failing_test stdout ----

thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to