alamb commented on code in PR #7718:
URL: https://github.com/apache/arrow-rs/pull/7718#discussion_r2158948808
##########
parquet-variant/src/variant.rs:
##########
@@ -1376,11 +1405,11 @@ impl<'v> From<&'v [u8]> for Variant<'_, 'v> {
impl<'v> From<&'v str> for Variant<'_, 'v> {
fn from(value: &'v str) -> Self {
- if value.len() < 64 {
- Variant::ShortString(value)
- } else {
- Variant::String(value)
+ if let Ok(value) = ShortString::try_new(value) {
Review Comment:
Since the arrow error has an allocated string in it, that means constructing
a Variant::String via this impl will result in an allocation, which is not ideal
Thus I recommend that for performance reasons this directly checks the
length of the string rather than relying on `try_new`
##########
parquet-variant/src/variant.rs:
##########
@@ -544,6 +544,35 @@ impl<'m, 'v> VariantList<'m, 'v> {
}
}
+const MAX_SHORT_STRING_SIZE: usize = 0x3F;
+
+/// A Variant [`ShortString`]
+#[derive(Debug, Clone, PartialEq)]
Review Comment:
I recommend making this `Copy` as well (to match the underlying `& str`)
```suggestion
#[derive(Debug, Clone, Copy, PartialEq)]
##########
parquet-variant/src/builder.rs:
##########
@@ -114,11 +113,11 @@ fn make_room_for_header(buffer: &mut Vec<u8>, start_pos:
usize, header_size: usi
/// };
/// assert_eq!(
/// variant_object.field_by_name("first_name").unwrap(),
-/// Some(Variant::ShortString("Jiaying"))
+/// Some(Variant::ShortString(ShortString::try_new("Jiaying").unwrap()))
Review Comment:
These examples might be nicer with the `From` impl (but we can do that as a
follow on PR)
```suggestion
/// Some(Variant::from("Jiaying"))
```
##########
parquet-variant/src/variant.rs:
##########
@@ -544,6 +544,35 @@ impl<'m, 'v> VariantList<'m, 'v> {
}
}
+const MAX_SHORT_STRING_SIZE: usize = 0x3F;
+
+/// A Variant [`ShortString`]
+#[derive(Debug, Clone, PartialEq)]
+pub struct ShortString<'a>(pub(crate) &'a str);
+
+impl<'a> ShortString<'a> {
Review Comment:
I think that would be the most user friendly to make it easier to find for
Rust beginners who aren't super familar with the `From` impl stuff
I would personally suggest naming it `as_str()` to make it super explicit
##########
parquet-variant/src/variant.rs:
##########
@@ -544,6 +544,35 @@ impl<'m, 'v> VariantList<'m, 'v> {
}
}
+const MAX_SHORT_STRING_SIZE: usize = 0x3F;
+
+/// A Variant [`ShortString`]
Review Comment:
```suggestion
/// A Variant [`ShortString`]
///
/// This implementation is a zero cost wrapper over `&str` that ensures
/// the length of the underlying string is a valid Variant short string (63
bytes or less)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]