Joseph-Rance commented on code in PR #4773:
URL: https://github.com/apache/arrow-rs/pull/4773#discussion_r1342562958
##########
parquet_derive/README.md:
##########
@@ -77,16 +77,55 @@ writer.close_row_group(row_group).unwrap();
writer.close().unwrap();
```
+Example usage of deriving a `RecordReader` for your struct:
+
+```rust
+use parquet::file::{serialized_reader::SerializedFileReader,
reader::FileReader};
+use parquet_derive::ParquetRecordReader;
+
+#[derive(ParquetRecordReader)]
+struct ACompleteRecord {
+ pub a_bool: bool,
+ pub a_string: String,
+ pub i16: i16,
+ pub i32: i32,
+ pub u64: u64,
+ pub isize: isize,
+ pub float: f32,
+ pub double: f64,
+ pub now: chrono::NaiveDateTime,
+ pub byte_vec: Vec<u8>,
+}
+
+// Initialize your parquet file
+let reader = SerializedFileReader::new(file).unwrap();
+let mut row_group = reader.get_row_group(0).unwrap();
+
+// create your records to read into
+let mut chunks = vec![ACompleteRecord{ ... }];
Review Comment:
We could, but currently we iterate over the columns of the parquet file, so
if we have a struct with fields `a` and `b`, then we would read and set all `a`
values, and then all `b` values. However, to construct the struct directly we
need `a` and `b` at the same time. So to do this we would have to read the
entire file into slices of each field and then after that we could construct
the structs.
I guess there is no problem with this, but it does waste memory by holding
the entire file in memory twice (once for the structs we create and once
(temporarily) for the values we are putting in the structs) to save one line of
code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]