scovich commented on code in PR #7666:
URL: https://github.com/apache/arrow-rs/pull/7666#discussion_r2151061142


##########
parquet-variant/src/variant.rs:
##########
@@ -377,30 +534,134 @@ impl<'m, 'v> VariantArray<'m, 'v> {
             .checked_add(value_bytes)
             .ok_or_else(overflow)?;
 
-        // Skip num_elements bytes to read the offsets
-        let start_field_offset_from_first_value_byte =
-            offset_size.unpack_usize(self.value_data, first_offset_byte, 
index)?;
-        let end_field_offset_from_first_value_byte =
-            offset_size.unpack_usize(self.value_data, first_offset_byte, index 
+ 1)?;
+        // Verify that the last offset array entry is inside the value slice
+        let last_offset_byte = first_offset_byte + n_offsets * offset_size as 
usize;
+        if last_offset_byte > value.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Last offset array entry at offset {} with length {} is 
outside the value slice of length {}",
+                last_offset_byte,
+                offset_size as usize,
+                value.len()
+            )));
+        }
+
+        // Verify that the value of the last offset array entry fits inside 
the value slice
+        let last_offset = offset_size.unpack_usize(value, first_offset_byte, 
num_elements)?;
+        if first_value_byte + last_offset > value.len() {
+            return Err(ArrowError::InvalidArgumentError(format!(
+                "Last offset value {} at offset {} is outside the value slice 
of length {}",
+                last_offset,
+                first_value_byte,
+                value.len()
+            )));
+        }
+
+        Ok(Self {
+            offset_size,
+            is_large,
+            num_elements,
+            first_offset_byte,
+            first_value_byte,
+        })
+    }
+
+    /// Returns the number of elements in this list
+    pub fn num_elements(&self) -> usize {
+        self.num_elements
+    }
+
+    /// Returns the offset size in bytes
+    pub fn offset_size(&self) -> usize {
+        self.offset_size as _

Review Comment:
   The `OffsetSizeBytes` enum occupies 1 byte in practice, so I don't think 
we'd save much? 
   The offsets (2x `usize`) will anyway dwarf it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to