alamb commented on code in PR #8600:
URL: https://github.com/apache/arrow-rs/pull/8600#discussion_r2449475543


##########
arrow-array/src/builder/generic_bytes_builder.rs:
##########
@@ -348,6 +348,50 @@ impl<O: OffsetSizeTrait> std::fmt::Write for 
GenericStringBuilder<O> {
     }
 }
 
+/// A byte size value representing the number of bytes to allocate per string 
in [`GenericStringBuilder`]
+///
+/// To create a [`GenericStringBuilder`] using `.with_capacity` we are 
required to provide: \
+/// - `item_capacity` - the row count \
+/// - `data_capacity` - total string byte count \
+///
+/// We will use the `AVERAGE_STRING_LENGTH` * row_count for `data_capacity`. \
+///
+/// These capacities are preallocation hints used to improve performance,
+/// but consuquences of passing a hint too large or too small should be 
negligible.
+const AVERAGE_STRING_LENGTH: usize = 16;
+/// Trait for string-like array builders
+///
+/// This trait provides unified interface for builders that append string-like 
data
+/// such as [`GenericStringBuilder<O>`] and 
[`crate::builder::StringViewBuilder`]
+pub trait StringLikeArrayBuilder: ArrayBuilder {
+    /// Returns a human-readable type name for the builder.
+    fn type_name() -> &'static str;
+
+    /// Creates a new builder with the given row capacity.

Review Comment:
   pedantically, this also allocates 16x the capacity for the variable payload 
in StringArray/ LargeStringArray too (it isn't just the row capacity)



##########
parquet-variant-compute/src/variant_to_arrow.rs:
##########
@@ -413,6 +432,13 @@ macro_rules! define_variant_to_primitive_builder {
     }
 }
 
+define_variant_to_primitive_builder!(
+    struct VariantToStringArrowBuilder<'a, B: StringLikeArrayBuilder>
+    |capacity| -> B { B::with_capacity(capacity) },
+    |value| value.as_string(),
+    type_name: B::type_name()
+);

Review Comment:
   that is pretty nice



##########
arrow-array/src/builder/generic_bytes_builder.rs:
##########
@@ -348,6 +348,40 @@ impl<O: OffsetSizeTrait> std::fmt::Write for 
GenericStringBuilder<O> {
     }
 }
 
+const AVERAGE_STRING_LENGTH: usize = 16;
+/// Trait for string-like array builders
+///
+/// This trait provides unified interface for builders that append string-like 
data
+/// such as [`GenericStringBuilder<O>`] and 
[`crate::builder::StringViewBuilder`]
+pub trait StringLikeArrayBuilder: ArrayBuilder {

Review Comment:
   FWIW this will go into arrow 57.1.0 (not in 57.0.0, which is due out 
tomorrow). 



##########
arrow-array/src/builder/generic_bytes_builder.rs:
##########
@@ -348,6 +348,40 @@ impl<O: OffsetSizeTrait> std::fmt::Write for 
GenericStringBuilder<O> {
     }
 }
 
+const AVERAGE_STRING_LENGTH: usize = 16;
+/// Trait for string-like array builders
+///
+/// This trait provides unified interface for builders that append string-like 
data
+/// such as [`GenericStringBuilder<O>`] and 
[`crate::builder::StringViewBuilder`]
+pub trait StringLikeArrayBuilder: ArrayBuilder {

Review Comment:
   I think it is ok to make it pub -- this seems like a reasonable API to me. 
We actually have something like this in DataFusion already so it makes sense



##########
arrow-array/src/builder/generic_bytes_builder.rs:
##########
@@ -348,6 +348,40 @@ impl<O: OffsetSizeTrait> std::fmt::Write for 
GenericStringBuilder<O> {
     }
 }
 
+const AVERAGE_STRING_LENGTH: usize = 16;

Review Comment:
   My only real concern with this hint is that it will work for some users and 
not for others (e.g. it will over allocate memory for short strings). Short of 
forcing the caller to pass in the variable capacity I can't see any way around 
it that doesn't have other tradeoffs
   
   if the need lower level control they can always use the underlying builder, 
so I think this default is ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to