alamb commented on code in PR #7052:
URL: https://github.com/apache/arrow-rs/pull/7052#discussion_r1944584087


##########
parquet/src/thrift.rs:
##########
@@ -251,7 +255,12 @@ impl TInputProtocol for TCompactSliceInputProtocol<'_> {
 
 fn collection_u8_to_type(b: u8) -> thrift::Result<TType> {
     match b {
-        0x01 => Ok(TType::Bool),
+        // For historical and compatibility reasons, a reader should be 
capable to deal with both cases.
+        // The only valid value in the original spec was 2, but due to an 
widespread implementation bug
+        // the defacto standard across large parts of the library became 1 
instead.
+        // As a result, both values are now allowed.
+        // 
https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md#list-and-set
+        0x01 | 0x02 => Ok(TType::Bool),

Review Comment:
   FYI @ritchie46  or @orlp -- we found a bug in reading arguably malformed 
parquet files created by go that @jhorstmann  fixed in parquet-rs. I did a 
quick scan in polars and didn't find the equivalent code (though I don't 
understand how `polars-parquet`  is structured enough to know really how to 
find it)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to