alamb commented on code in PR #9646:
URL: https://github.com/apache/arrow-datafusion/pull/9646#discussion_r1530875351


##########
datafusion/core/src/datasource/physical_plan/parquet/statistics.rs:
##########
@@ -109,17 +109,24 @@ macro_rules! get_statistic {
                     }
                 }
             }
-            // type not supported yet
+            // type not fully supported yet
             ParquetStatistics::FixedLenByteArray(s) => {
                 match $target_arrow_type {
-                    // just support the decimal data type
+                    // just support specific logical data types, there are 
others each
+                    // with their own ordering
                     Some(DataType::Decimal128(precision, scale)) => {
                         Some(ScalarValue::Decimal128(
                             Some(from_bytes_to_i128(s.$bytes_func())),
                             *precision,
                             *scale,
                         ))
                     }
+                    Some(DataType::FixedSizeBinary(size)) => {
+                        Some(ScalarValue::FixedSizeBinary(

Review Comment:
   Should we verify that the resulting bytes have the correct size? As in 
verify that `*size` is the same as `(s.$bytes_func().to_vec().len()`?
   
   If not, I suggest we ignore the error (but don't use the statistics) the 
same way as is done for non UTF8 values i string statistics. Perhaps we can at 
least debug when this happens with `debug!`
   
   Since this data can come from any arbitrary parquet writer, I think it is 
good practice to validate what is in there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to