[
https://issues.apache.org/jira/browse/ARROW-17007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kouhei Sutou closed ARROW-17007.
--------------------------------
Resolution: Invalid
Could you report this to https://github.com/apache/arrow-rs/issues/new/choose ?
We've changed Apache Arrow Rust's issue tracker to
https://github.com/apache/arrow-rs/issues on GitHub from here.
> [Rust][Parquet] array reader for list columns fails to decode if batches fall
> on row group boundaries
> -----------------------------------------------------------------------------------------------------
>
> Key: ARROW-17007
> URL: https://issues.apache.org/jira/browse/ARROW-17007
> Project: Apache Arrow
> Issue Type: Bug
> Components: Parquet, Rust
> Reporter: Tim Wilson
> Priority: Major
>
> This appears to be a variant of ARROW-9790, but specifically for list
> columns. Affects the latest released version of the rust crates arrow and
> parquet (17.0.0).
> {code:java}
> use arrow::array::{Int32Builder, ListBuilder};
> use arrow::datatypes::{DataType, Field, Schema};
> use arrow::record_batch::RecordBatch;
> use parquet::arrow::{ArrowReader, ArrowWriter, ParquetFileArrowReader};
> use parquet::file::properties::WriterProperties;
> use parquet::file::reader::SerializedFileReader;
> use std::error::Error;
> use std::sync::Arc;
> use tempfile::NamedTempFile;
> fn main() -> Result<(), Box<dyn Error>> {
> let schema = Arc::new(Schema::new(vec![
> Field::new("int", DataType::Int32, false),
> Field::new(
> "list",
> DataType::List(Box::new(Field::new("item", DataType::Int32,
> true))),
> false,
> ),
> ]));
> let temp_file = NamedTempFile::new()?;
> let mut writer = ArrowWriter::try_new(
> temp_file.reopen()?,
> schema.clone(),
> Some(
> WriterProperties::builder()
> .set_max_row_group_size(8)
> .build(),
> ),
> )?;
> for _ in 0..2 {
> let mut int_builder = Int32Builder::new(10);
> let mut list_builder = ListBuilder::new(Int32Builder::new(10));
> for i in 0..10 {
> int_builder.append_value(i)?;
> list_builder.append(true)?;
> }
> let batch = RecordBatch::try_new(
> schema.clone(),
> vec![
> Arc::new(int_builder.finish()),
> Arc::new(list_builder.finish()),
> ],
> )?;
> writer.write(&batch)?;
> }
> writer.close()?;
> let file_reader =
> Arc::new(SerializedFileReader::new(temp_file.reopen()?)?);
> let mut file_reader = ParquetFileArrowReader::new(file_reader);
> let mut record_reader = file_reader.get_record_reader(8)?;
> assert_eq!(8, record_reader.next().unwrap()?.num_rows());
> assert_eq!(8, record_reader.next().unwrap()?.num_rows());
> assert_eq!(4, record_reader.next().unwrap()?.num_rows());
> Ok(())
> }
> {code}
> Fails with `Error: ParquetError("Parquet error: Not all children array length
> are the same!")`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)