[
https://issues.apache.org/jira/browse/ARROW-6901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paddy Horan resolved ARROW-6901.
--------------------------------
Fix Version/s: 1.0.0
Resolution: Fixed
Issue resolved by pull request 5672
[https://github.com/apache/arrow/pull/5672]
> [Rust][Parquet] SerializedFileWriter writes total_num_rows as zero
> ------------------------------------------------------------------
>
> Key: ARROW-6901
> URL: https://issues.apache.org/jira/browse/ARROW-6901
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Affects Versions: 0.14.1, 0.15.0
> Reporter: Matthew Franglen
> Assignee: Matthew Franglen
> Priority: Minor
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> The SerializedFileWriter does not update total_num_rows at any point. This
> results in consistently writing zero as the number of rows in the file.
>
> This code will fail:
> {code:java}
> let data = vec![vec![1, 2, 3, 4, 5]];
> let file: File = ...;
> let schema = Rc::new(
> types::Type::group_type_builder("schema")
> .with_fields(&mut vec![Rc::new(
> types::Type::primitive_type_builder("col1", Type::INT32)
> .with_repetition(Repetition::REQUIRED)
> .build()
> .unwrap(),
> )])
> .build()
> .unwrap(),
> );
> let props = Rc::new(WriterProperties::builder().build());
> let mut file_writer =
> SerializedFileWriter::new(file.try_clone().unwrap(), schema,
> props).unwrap();
> let mut rows: i64 = 0;
> for subset in &data {
> let mut row_group_writer = file_writer.next_row_group().unwrap();
> let col_writer = row_group_writer.next_column().unwrap();
> if let Some(mut writer) = col_writer {
> match writer {
> ColumnWriter::Int32ColumnWriter(ref mut typed) => {
> rows += typed.write_batch(&subset[..], None, None).unwrap()
> as i64;
> }
> _ => {
> unimplemented!();
> }
> }
> row_group_writer.close_column(writer).unwrap();
> }
> file_writer.close_row_group(row_group_writer).unwrap();
> }
> file_writer.close().unwrap();
> let reader = SerializedFileReader::new(file).unwrap();
> assert_eq!(reader.num_row_groups(), data.len());
> assert_eq!(reader.metadata().file_metadata().num_rows(), rows, "row count in
> metadata not equal to number of rows written");
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)