tustvold opened a new issue, #2253:
URL: https://github.com/apache/arrow-rs/issues/2253
**Describe the bug**
<!--
A clear and concise description of what the bug is.
-->
`ComplexObjectArrayReader` does not use `RecordReader` and consequently does
not correctly delimit semantic records when reading, in particular it may yield
values that truncate a row part way through.
**To Reproduce**
<!--
Steps to reproduce the behavior:
-->
```
fn test_decimal_list() {
let decimals = Decimal128Array::from_iter_values([1, 2, 3, 4, 5, 6, 7,
8]);
// [[], [1], [2, 3], null, [4], null, [6, 7, 8]]
let data = ArrayDataBuilder::new(ArrowDataType::List(Box::new(Field::new(
"item",
decimals.data_type().clone(),
false,
))))
.len(7)
.add_buffer(Buffer::from_iter([0_i32, 0, 1, 3, 3, 4, 5, 8]))
.null_bit_buffer(Some(Buffer::from(&[0b01010111])))
.child_data(vec![decimals.into_data()])
.build()
.unwrap();
let written = RecordBatch::try_from_iter([(
"list",
Arc::new(ListArray::from(data)) as ArrayRef,
)])
.unwrap();
let mut buffer = Vec::with_capacity(1024);
let mut writer =
ArrowWriter::try_new(&mut buffer, written.schema(), None).unwrap();
writer.write(&written).unwrap();
writer.close().unwrap();
let read = ParquetFileArrowReader::try_new(Bytes::from(buffer))
.unwrap()
.get_record_reader(3)
.unwrap()
.collect::<ArrowResult<Vec<_>>>()
.unwrap();
assert_eq!(&written.slice(0, 3), &read[0]);
assert_eq!(&written.slice(3, 3), &read[1]);
assert_eq!(&written.slice(6, 1), &read[2]);
}
```
Results in
```
ParquetError("Parquet error: first repetition level of batch must be 0")
```
**Expected behavior**
<!--
A clear and concise description of what you expected to happen.
-->
We should support reading these nested types.
**Additional context**
<!--
Add any other context about the problem here.
-->
https://github.com/apache/arrow-rs/issues/1661 tracks removing this
ArrayReader as it is buggy, complex, and not really needed anymore
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]