etseidl commented on code in PR #9884:
URL: https://github.com/apache/arrow-rs/pull/9884#discussion_r3183987467
##########
parquet/src/schema/types.rs:
##########
@@ -1399,9 +1399,36 @@ fn schema_from_array_helper<'a>(
}
}
Some(n) => {
+ // `num_children` is an `i32` that came straight from the
+ // attacker-controlled Thrift footer. Validate it before using it
+ // as a `Vec::with_capacity` argument:
+ // 1. Negative counts are nonsensical and would wrap to a huge
+ // `usize` on the cast.
+ // 2. Each declared child must consume at least one schema-array
+ // element (more if the child is itself a group), so a count
+ // that exceeds the number of remaining elements is invalid
+ // and would otherwise drive `Vec::with_capacity(~i32::MAX)`
+ // into a `capacity overflow` panic.
Review Comment:
This is already a large function. There is no need to compound this by
documenting the obvious.
```suggestion
```
##########
parquet/src/schema/types.rs:
##########
@@ -2517,4 +2544,65 @@ mod tests {
let result_schema = parquet_schema_from_array(thrift_schema).unwrap();
assert_eq!(result_schema, expected_schema);
}
+
+ /// Regression test: a `SchemaElement` whose `num_children` is negative
+ /// must surface as a `ParquetError` rather than wrapping into a huge
+ /// `usize` and panicking inside `Vec::with_capacity`. Discovered while
+ /// fuzzing `ParquetMetaDataReader::decode_metadata`
(apache/arrow-rs#5332).
+ #[test]
+ fn test_parquet_schema_from_array_rejects_negative_num_children() {
+ let elements = vec![SchemaElement {
+ r#type: None,
+ type_length: None,
+ repetition_type: Some(Repetition::REQUIRED),
+ name: "schema",
+ num_children: Some(-1),
+ converted_type: None,
+ scale: None,
+ precision: None,
+ field_id: None,
+ logical_type: None,
+ }];
+ let result = parquet_schema_from_array(elements);
+ assert!(result.is_err(), "expected error, got {result:?}");
+ }
+
+ /// Regression test: a `num_children` that exceeds the number of
+ /// remaining schema-array entries must be rejected before
+ /// `Vec::with_capacity` is called. Without the bound the cast to
+ /// `usize` produces an `i32::MAX`-class allocation request that
+ /// trips the `capacity_overflow` panic at
+ /// `alloc::raw_vec::handle_error`.
+ #[test]
+ fn test_parquet_schema_from_array_rejects_overflowing_num_children() {
Review Comment:
This test does not error on main. The other two do not error if the vector
allocation is replaced with
```rust
let mut fields = Vec::with_capacity(usize::try_from(n)?);
```
##########
parquet/src/schema/types.rs:
##########
@@ -1399,9 +1399,36 @@ fn schema_from_array_helper<'a>(
}
}
Some(n) => {
+ // `num_children` is an `i32` that came straight from the
+ // attacker-controlled Thrift footer. Validate it before using it
+ // as a `Vec::with_capacity` argument:
+ // 1. Negative counts are nonsensical and would wrap to a huge
+ // `usize` on the cast.
+ // 2. Each declared child must consume at least one schema-array
+ // element (more if the child is itself a group), so a count
+ // that exceeds the number of remaining elements is invalid
+ // and would otherwise drive `Vec::with_capacity(~i32::MAX)`
+ // into a `capacity overflow` panic.
+ if n < 0 {
+ return Err(general_err!(
+ "Schema element {:?} has negative num_children {}",
+ element.name,
+ n
+ ));
+ }
+ let n_children = n as usize;
Review Comment:
```suggestion
let n_children = usize::try_from(n)?;
```
##########
parquet/src/schema/types.rs:
##########
@@ -1399,9 +1399,36 @@ fn schema_from_array_helper<'a>(
}
}
Some(n) => {
+ // `num_children` is an `i32` that came straight from the
+ // attacker-controlled Thrift footer. Validate it before using it
+ // as a `Vec::with_capacity` argument:
+ // 1. Negative counts are nonsensical and would wrap to a huge
+ // `usize` on the cast.
+ // 2. Each declared child must consume at least one schema-array
+ // element (more if the child is itself a group), so a count
+ // that exceeds the number of remaining elements is invalid
+ // and would otherwise drive `Vec::with_capacity(~i32::MAX)`
+ // into a `capacity overflow` panic.
+ if n < 0 {
+ return Err(general_err!(
+ "Schema element {:?} has negative num_children {}",
+ element.name,
+ n
+ ));
+ }
+ let n_children = n as usize;
+ let remaining = num_elements.saturating_sub(index + 1);
+ if n_children > remaining {
Review Comment:
This is probably a good check to perform, but it would be nice if the unit
tests actually provided coverage here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]