etseidl commented on code in PR #9924:
URL: https://github.com/apache/arrow-rs/pull/9924#discussion_r3190624252


##########
parquet/src/file/serialized_reader.rs:
##########
@@ -2117,7 +2117,7 @@ mod tests {
         let ret = SerializedFileReader::new(Bytes::copy_from_slice(&data));
         assert_eq!(
             ret.err().unwrap().to_string(),
-            "Parquet error: Received empty union from remote ColumnOrder"
+            "Parquet error: Expected list element type of Struct but got List"

Review Comment:
   The data for this test is
   ```
   [255, 172, 1, 0, 50, 82, 65, 73, 1, 0, 0, 0, 169, 168, 168, 162, 87, 255, 
16, 0, 0, 0, 80, 65, 82, 49]
             |                                                               |  
          |
             start of footer                                                 
length       PAR1                                                               
                         
   ```
   
   The `x01` at the start decodes as a delta of 0 with a field type of 
`BooleanTrue`. Because delta is 0, a varint is read to obtain the field id, 
which consumes the `0` and returns a field id of 0, which is then skipped as 
unknown. The `50` (hex `0x32`) encodes a delta of 3, with a field type of 
`BooleanFalse`. Because an `i64` is expected, the `82` (hex `x52`) is consumed 
and returned as the value for `num_rows` (field 3). `65` (hex `x41`) is delta 4 
-> field 7, with a type of `BooleanTrue`. Field 7 is a list of structs, so the 
`73` (hex `x49`) encodes 4 elements of type `List`. With the fix in this PR, 
the `List` is compared to the expected `Struct` type, and errors. Without the 
fix, encoding continues until a different error is detected.
   
   Checking the expected field types would detect this even earlier (when the 
3rd byte is consumed), but that is left for future work.
                                                              



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to