veluca93 commented on code in PR #6021:
URL: https://github.com/apache/arrow-rs/pull/6021#discussion_r1669391133
##########
parquet/src/arrow/array_reader/byte_view_array.rs:
##########
@@ -315,26 +315,30 @@ impl ByteViewArrayDecoderPlain {
let mut read = 0;
output.views.reserve(to_read);
while self.offset < self.buf.len() && read != to_read {
- if self.offset + 4 > self.buf.len() {
- return Err(ParquetError::EOF("eof decoding byte
array".into()));
- }
- let len_bytes: [u8; 4] = unsafe {
- buf.get_unchecked(self.offset..self.offset + 4)
- .try_into()
- .unwrap()
+ let len_bytes = buf.get(self.offset..self.offset + 4);
+ let len = match len_bytes {
+ None => {
+ return Err(ParquetError::EOF("eof decoding byte
array".into()));
+ }
+ Some(l) => u32::from_le_bytes(l.try_into().unwrap()),
};
- let len = u32::from_le_bytes(len_bytes);
let start_offset = self.offset + 4;
let end_offset = start_offset + len as usize;
- if end_offset > buf.len() {
+ if end_offset > buf.len() || end_offset < start_offset {
return Err(ParquetError::EOF("eof decoding byte
array".into()));
}
if self.validate_utf8 {
- check_valid_utf8(unsafe {
buf.get_unchecked(start_offset..end_offset) })?;
+ check_valid_utf8(
+ // SAFETY: bounds are checked above.
+ unsafe { buf.get_unchecked(start_offset..end_offset) },
+ )?;
}
+ // SAFETY: bounds are checked above, block_id comes from a call to
append_block, and
+ // we are dealing with byte arrays with no alignment requirements.
+ // FIXME: what about UTF8 arrays if validate_utf8 is false?
Review Comment:
If I understand correctly, this is up to the caller to ensure - i.e. the
caller must ensure (i.e. when calling `new`) that `validate_utf8` is true if
the data does not come from a StringViewArray. If that is correct, then this
leads to potential UB if the caller does not uphold this contract, which means
that `new` should be an unsafe function and this property should ideally be
documented as a safety invariant for the struct.
But I may have misunderstood something :-)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]