alamb commented on code in PR #7915:
URL: https://github.com/apache/arrow-rs/pull/7915#discussion_r2205367028


##########
parquet-variant/src/builder.rs:
##########
@@ -350,14 +378,59 @@ impl<S: AsRef<str>> FromIterator<S> for MetadataBuilder {
     }
 }
 
-impl<S: AsRef<str>> Extend<S> for MetadataBuilder {
+impl<S: AsRef<str>> Extend<S> for DefaultMetadataBuilder {
     fn extend<T: IntoIterator<Item = S>>(&mut self, iter: T) {
         for field_name in iter {
             self.upsert_field_name(field_name.as_ref());
         }
     }
 }
 
+/// Read-only metadata builder that validates field names against an existing 
metadata dictionary

Review Comment:
   > What do we gain by using views at this low level?
   
   The major benefit in my mind is that it means any row level reading of 
`Variant` is the same for shredded and unshredded Variants. 
   
   For example we can implement the general purpose `variant_get` kernel in 
terms of a `Varaint` at first
   
   Then we can add the special paths / implementations for `varaint_get` on 
shredded arrays that are more efficient as follow on work. 
   
   
   > If we could get away with "views" here on the unshredding path, why can't 
the native builder get away with "views" internally to avoid all the repeated 
copying of bytes as we unwind nested builders?
   
   I am not quite sure what you are proposing here, but it sounds good to me
   
   > Isolated reaction: I would favor Variant::ShreddedObject as its own enum 
variant, rather than making Variant::Object more complex. They're pretty 
fundamentally different cases.
   
   I agree
   
   
   What I plan to do is to try to help along 
https://github.com/apache/arrow-rs/pull/7914 and the variant_get API in 
https://github.com/apache/arrow-rs/pull/7919
   
   With those in place I think we'll be in a place to try and sketch out how 
shredded variants could work



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to