etseidl opened a new issue, #9923:
URL: https://github.com/apache/arrow-rs/issues/9923
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
A recent batch of issues discovered via fuzzing exposes a flaw long present
in the thrift decoder, namely the decoder does nothing to verify that the
encoded values are of the expected type.
As an example, when reading a list, a header is decoded from the Thrift
stream containing the number of elements and the element type. The expected
element type is known, but it is never matched with the element type in the
header. An example from the current code where a list of structs is expected,
but `list_ident.element_type` is not checked:
```rust
let list_ident = prot.read_list_begin()?;
if schema_descr.num_columns() != list_ident.size as usize {
return Err(general_err!(
"Column count mismatch. Schema has {} columns while
Row Group has {}",
schema_descr.num_columns(),
list_ident.size
));
}
for i in 0..list_ident.size as usize {
let col = read_column_chunk(prot,
&schema_descr.columns()[i], i, options)?;
row_group.columns.push(col);
}
```
The same is true when decoding structs. The field header has a field type
component, but it is similarly not checked.
**Describe the solution you'd like**
Where there is no negative impact on performance, validation should be
added. If for no other reason than to catch certain errors earlier, and provide
more useful error information (when left unchecked, processing an int array as
a struct array can lead to some very misleading errors, like "required field
foo is missing").
**Describe alternatives you've considered**
If validation is too burdensome, it can be skipped except in cases where the
lack can lead to panics.
**Additional context**
The old thrift compiler generated code also did no validation, the generated
code for C++ and Python do perform some checks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]