Re: [PR] Add `RecordReader` trait and proc macro to implement it for a struct [arrow-rs]

via GitHub Thu, 26 Oct 2023 07:57:46 -0700


tustvold commented on code in PR #4773:
URL: https://github.com/apache/arrow-rs/pull/4773#discussion_r1373300875



##########
parquet_derive/src/parquet_field.rs:
##########
@@ -348,6 +417,47 @@ impl Field {
         }
     }
 
+    fn copied_direct_fields(&self) -> proc_macro2::TokenStream {
+        let field_name = &self.ident;
+
+        let value = match self.third_party_type {
+            Some(ThirdPartyType::ChronoNaiveDateTime) => {
+                quote! { 
::chrono::naive::NaiveDateTime::from_timestamp_millis(vals[i]).unwrap() }
+            }
+            Some(ThirdPartyType::ChronoNaiveDate) => {
+                quote! {
+                    
::chrono::naive::NaiveDate::from_num_days_from_ce_opt(vals[i]

Review Comment:
   Why is this logic different to above?



##########
parquet_derive/README.md:
##########
@@ -71,22 +71,61 @@ let mut row_group = writer.next_row_group().unwrap();
 let chunks = vec![ACompleteRecord{...}];
 
 // The derived `RecordWriter` takes over here
-(&chunks[..]).write_to_row_group(&mut row_group);
+chunks.write_to_row_group(&mut row_group);
 
 writer.close_row_group(row_group).unwrap();
 writer.close().unwrap();
 ```
 
+Example usage of deriving a `RecordReader` for your struct:
+
+```rust
+use parquet::file::{serialized_reader::SerializedFileReader, 
reader::FileReader};
+use parquet_derive::ParquetRecordReader;
+
+#[derive(ParquetRecordReader)]
+struct ACompleteRecord {
+    pub a_bool: bool,
+    pub a_string: String,
+    pub i16: i16,
+    pub i32: i32,
+    pub u64: u64,
+    pub isize: isize,
+    pub float: f32,
+    pub double: f64,
+    pub now: chrono::NaiveDateTime,
+    pub byte_vec: Vec<u8>,
+}
+
+// Initialize your parquet file
+let reader = SerializedFileReader::new(file).unwrap();
+let mut row_group = reader.get_row_group(0).unwrap();
+
+// create your records vector to read into
+let mut chunks: Vec<ACompleteRecord> = Vec::new();
+
+// The derived `RecordReader` takes over here
+chunks.read_from_row_group(&mut *row_group, 1).unwrap();
+```
+
 ## Features
 
 - [x] Support writing `String`, `&str`, `bool`, `i32`, `f32`, `f64`, `Vec<u8>`
 - [ ] Support writing dictionaries
 - [x] Support writing logical types like timestamp
-- [x] Derive definition_levels for `Option`
-- [ ] Derive definition levels for nested structures
+- [x] Derive definition_levels for `Option` for writing
+- [ ] Derive definition levels for nested structures for writing
 - [ ] Derive writing tuple struct
 - [ ] Derive writing `tuple` container types
 
+- [x] Support reading `String`, `&str`, `bool`, `i32`, `f32`, `f64`, `Vec<u8>`
+- [ ] Support reading/writing dictionaries
+- [x] Support reading/writing logical types like timestamp
+- [ ] Derive definition_levels for `Option` for reading
+- [ ] Derive definition levels for nested structures for reading

Review Comment:
   ```suggestion
   - [ ] Handle definition_levels for `Option` for reading
   - [ ] Handle definition levels for nested structures for reading
   ```



##########
parquet_derive/src/lib.rs:
##########
@@ -95,7 +95,7 @@ pub fn parquet_record_writer(input: proc_macro::TokenStream) 
-> proc_macro::Toke
         field_infos.iter().map(|x| x.parquet_type()).collect();
 
     (quote! {
-    impl #generics ::parquet::record::RecordWriter<#derived_for #generics> for 
&[#derived_for #generics] {
+    impl #generics ::parquet::record::RecordWriter<#derived_for #generics> for 
Vec<#derived_for #generics> {

Review Comment:
   Why this change? I think it makes sense that the writer doesn't require an 
owned container as it isn't mutating it?



##########
parquet_derive/src/lib.rs:
##########
@@ -137,3 +137,89 @@ pub fn parquet_record_writer(input: 
proc_macro::TokenStream) -> proc_macro::Toke
     }
   }).into()
 }
+
+/// Derive flat, simple RecordReader implementations. Works by parsing
+/// a struct tagged with `#[derive(ParquetRecordReader)]` and emitting
+/// the correct writing code for each field of the struct. Column readers
+/// are generated in the order they are defined.
+///
+/// It is up to the programmer to keep the order of the struct
+/// fields lined up with the schema.
+///
+/// Example:
+///
+/// ```ignore
+/// use parquet::file::{serialized_reader::SerializedFileReader, 
reader::FileReader};
+/// use parquet_derive::{ParquetRecordReader};
+///
+/// #[derive(ParquetRecordReader)]
+/// struct ACompleteRecord {
+///     pub a_bool: bool,
+///     pub a_string: String,
+/// }
+///
+/// pub fn read_some_records() -> ACompleteRecord {

Review Comment:
   ```suggestion
   /// pub fn read_some_records() -> Vec<ACompleteRecord> {
   ```
   Perhaps?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add `RecordReader` trait and proc macro to implement it for a struct [arrow-rs]

Reply via email to