alamb commented on code in PR #7915:
URL: https://github.com/apache/arrow-rs/pull/7915#discussion_r2205367028
##########
parquet-variant/src/builder.rs:
##########
@@ -350,14 +378,59 @@ impl<S: AsRef<str>> FromIterator<S> for MetadataBuilder {
}
}
-impl<S: AsRef<str>> Extend<S> for MetadataBuilder {
+impl<S: AsRef<str>> Extend<S> for DefaultMetadataBuilder {
fn extend<T: IntoIterator<Item = S>>(&mut self, iter: T) {
for field_name in iter {
self.upsert_field_name(field_name.as_ref());
}
}
}
+/// Read-only metadata builder that validates field names against an existing
metadata dictionary
Review Comment:
> What do we gain by using views at this low level?
The major benefit in my mind is that it means any row level reading of
`Variant` is the same for shredded and unshredded Variants.
For example we can implement the general purpose `variant_get` kernel in
terms of a `Varaint` at first
Then we can add the special paths / implementations for `varaint_get` on
shredded arrays that are more efficient as follow on work.
> If we could get away with "views" here on the unshredding path, why can't
the native builder get away with "views" internally to avoid all the repeated
copying of bytes as we unwind nested builders?
I am not quite sure what you are proposing here, but it sounds good to me
> Isolated reaction: I would favor Variant::ShreddedObject as its own enum
variant, rather than making Variant::Object more complex. They're pretty
fundamentally different cases.
I agree
What I plan to do is to try to help along
https://github.com/apache/arrow-rs/pull/7914 and the variant_get API in
https://github.com/apache/arrow-rs/pull/7919
With those in place I think we'll be in a place to try and sketch out how
shredded variants could work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]