etseidl opened a new issue, #9923:
URL: https://github.com/apache/arrow-rs/issues/9923

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   A recent batch of issues discovered via fuzzing exposes a flaw long present 
in the thrift decoder, namely the decoder does nothing to verify that the 
encoded values are of the expected type. 
   
   As an example, when reading a list, a header is decoded from the Thrift 
stream containing the number of elements and the element type. The expected 
element type is known, but it is never matched with the element type in the 
header. An example from the current code where a list of structs is expected, 
but `list_ident.element_type` is not checked:
   
   ```rust
                  let list_ident = prot.read_list_begin()?;
                   if schema_descr.num_columns() != list_ident.size as usize {
                       return Err(general_err!(
                           "Column count mismatch. Schema has {} columns while 
Row Group has {}",
                           schema_descr.num_columns(),
                           list_ident.size
                       ));
                   }
                   for i in 0..list_ident.size as usize {
                       let col = read_column_chunk(prot, 
&schema_descr.columns()[i], i, options)?;
                       row_group.columns.push(col);
                   }
   ```
   
   The same is true when decoding structs. The field header has a field type 
component, but it is similarly not checked.
   
   **Describe the solution you'd like**
   Where there is no negative impact on performance, validation should be 
added. If for no other reason than to catch certain errors earlier, and provide 
more useful error information (when left unchecked, processing an int array as 
a struct array can lead to some very misleading errors, like "required field 
foo is missing").
   
   **Describe alternatives you've considered**
   If validation is too burdensome, it can be skipped except in cases where the 
lack can lead to panics.
   
   **Additional context**
   The old thrift compiler generated code also did no validation, the generated 
code for C++ and Python do perform some checks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to