scovich commented on code in PR #7757:
URL: https://github.com/apache/arrow-rs/pull/7757#discussion_r2164969718


##########
parquet-variant/src/utils.rs:
##########
@@ -69,33 +69,33 @@ pub(crate) fn string_from_slice(slice: &[u8], range: 
Range<usize>) -> Result<&st
 /// * `range` - The range to search in
 /// * `target` - The target value to search for
 /// * `key_extractor` - A function that extracts a comparable key from slice 
elements.
-///   This function can fail and return an error.
+///   This function can fail and return None.
 ///
 /// # Returns
-/// * `Ok(Ok(index))` - Element found at the given index
-/// * `Ok(Err(index))` - Element not found, but would be inserted at the given 
index
-/// * `Err(e)` - Key extraction failed with error `e`
-pub(crate) fn try_binary_search_range_by<K, E, F>(
+/// * `Some(Ok(index))` - Element found at the given index
+/// * `Some(Err(index))` - Element not found, but would be inserted at the 
given index
+/// * `None` - Key extraction failed
+pub(crate) fn try_binary_search_range_by<K, F>(
     range: Range<usize>,
     target: &K,
-    mut key_extractor: F,
-) -> Result<Result<usize, usize>, E>
+    key_extractor: F,
+) -> Option<Result<usize, usize>>

Review Comment:
   This really needs 
[try_trait_v2](https://github.com/rust-lang/rust/issues/84277) feature to 
stabilize... then the passed in key extractor could decide the return type 
(`Option` vs. `Result`)



##########
parquet-variant/src/variant/object.rs:
##########
@@ -145,7 +157,19 @@ impl<'m, 'v> VariantObject<'m, 'v> {
     }
 
     /// Get a field's name by index in `0..self.len()`
-    pub fn field_name(&self, i: usize) -> Result<&'m str, ArrowError> {
+    pub fn field_name(&self, i: usize) -> Option<&'m str> {
+        if i >= self.num_elements {
+            return None;
+        }
+
+        match self.try_field_name(i) {
+            Ok(field_name) => Some(field_name),
+            Err(err) => panic!("validation error: {}", err),

Review Comment:
   Given that this panic is ostensibly unreachable, would it be a good place 
for something like this?
   ```rust
   let result = self.try_field_name(i);
   debug_assert!(matches!(result, Ok(_)), "Unexpected validation failure: 
{result:?}");
   result.ok()
   ```
   



##########
parquet-variant/src/variant/object.rs:
##########
@@ -145,7 +157,19 @@ impl<'m, 'v> VariantObject<'m, 'v> {
     }
 
     /// Get a field's name by index in `0..self.len()`
-    pub fn field_name(&self, i: usize) -> Result<&'m str, ArrowError> {
+    pub fn field_name(&self, i: usize) -> Option<&'m str> {
+        if i >= self.num_elements {
+            return None;
+        }
+
+        match self.try_field_name(i) {
+            Ok(field_name) => Some(field_name),
+            Err(err) => panic!("validation error: {}", err),

Review Comment:
   I don't love panics... is there a reason out of bounds is None but any other 
problem is a panic? If anything, this is backward -- the caller could prevent 
an out of bounds panic by checking length before making this call. But they 
can't prevent a panic caused by invalid bytes in the buffer.



##########
parquet-variant/src/variant/list.rs:
##########
@@ -123,7 +123,11 @@ impl<'m, 'v> VariantList<'m, 'v> {
         self.len() == 0
     }
 
-    pub fn get(&self, index: usize) -> Result<Variant<'m, 'v>, ArrowError> {
+    pub fn get(&self, index: usize) -> Option<Variant<'m, 'v>> {
+        self.get_err(index).ok()
+    }
+
+    fn get_err(&self, index: usize) -> Result<Variant<'m, 'v>, ArrowError> {

Review Comment:
   Agree. And if we do make the constructor less thorough then we have to be 
more careful in our implementations of `get` et al, because they could be 
encountering bad bytes for the first time.



##########
parquet-variant/src/variant/object.rs:
##########
@@ -245,22 +255,35 @@ mod tests {
         assert!(!variant_obj.is_empty());
 
         // Test field access
-        let active_field = variant_obj.field_by_name("active").unwrap();
+        let active_field = variant_obj.get("active");
         assert!(active_field.is_some());
         assert_eq!(active_field.unwrap().as_boolean(), Some(true));
 
-        let age_field = variant_obj.field_by_name("age").unwrap();
+        let age_field = variant_obj.get("age");
         assert!(age_field.is_some());
         assert_eq!(age_field.unwrap().as_int8(), Some(42));
 
-        let name_field = variant_obj.field_by_name("name").unwrap();
+        let name_field = variant_obj.get("name");
         assert!(name_field.is_some());
         assert_eq!(name_field.unwrap().as_string(), Some("hello"));
 
         // Test non-existent field
-        let missing_field = variant_obj.field_by_name("missing").unwrap();
+        let missing_field = variant_obj.get("missing");
         assert!(missing_field.is_none());
 
+        // Fixme: This assertion will panic! That is not good
+        // let missing_field_name = variant_obj.field_name(3);
+        // assert!(missing_field_name.is_none());

Review Comment:
   The assertion on L290 fails? Or the `field_name(3)` call on L289 panics?
   
   Looking at the code, it seems like a missing bounds check in `field_name` 
which reads garbage from the wrong part of the slice, which I guess must 
somehow cause a panic inside the metadata dictionary lookup?
   
   ... but this PR adds the missing bounds checks, so I would expect the panic 
went away? If so, it would be good to triage the actual panic (in metadata 
dictionary code), since the bounds check here only masks the real problem.



##########
parquet-variant/src/variant/object.rs:
##########
@@ -145,7 +157,19 @@ impl<'m, 'v> VariantObject<'m, 'v> {
     }
 
     /// Get a field's name by index in `0..self.len()`
-    pub fn field_name(&self, i: usize) -> Result<&'m str, ArrowError> {
+    pub fn field_name(&self, i: usize) -> Option<&'m str> {
+        if i >= self.num_elements {
+            return None;
+        }
+
+        match self.try_field_name(i) {
+            Ok(field_name) => Some(field_name),
+            Err(err) => panic!("validation error: {}", err),

Review Comment:
   Heh. And now I see outdated comments that recommended moving away from 
`ok()`...



##########
parquet-variant/src/variant/object.rs:
##########
@@ -145,7 +157,19 @@ impl<'m, 'v> VariantObject<'m, 'v> {
     }
 
     /// Get a field's name by index in `0..self.len()`
-    pub fn field_name(&self, i: usize) -> Result<&'m str, ArrowError> {
+    pub fn field_name(&self, i: usize) -> Option<&'m str> {
+        if i >= self.num_elements {
+            return None;
+        }
+
+        match self.try_field_name(i) {
+            Ok(field_name) => Some(field_name),
+            Err(err) => panic!("validation error: {}", err),

Review Comment:
   I guess the constructor currently validates the bytes, so this should 
actually be unreachable code. 
   Still, `ok()` seems cleaner?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to