alamb commented on code in PR #9646:
URL: https://github.com/apache/arrow-datafusion/pull/9646#discussion_r1530875351
##########
datafusion/core/src/datasource/physical_plan/parquet/statistics.rs:
##########
@@ -109,17 +109,24 @@ macro_rules! get_statistic {
}
}
}
- // type not supported yet
+ // type not fully supported yet
ParquetStatistics::FixedLenByteArray(s) => {
match $target_arrow_type {
- // just support the decimal data type
+ // just support specific logical data types, there are
others each
+ // with their own ordering
Some(DataType::Decimal128(precision, scale)) => {
Some(ScalarValue::Decimal128(
Some(from_bytes_to_i128(s.$bytes_func())),
*precision,
*scale,
))
}
+ Some(DataType::FixedSizeBinary(size)) => {
+ Some(ScalarValue::FixedSizeBinary(
Review Comment:
Should we verify that the resulting bytes have the correct size? As in
verify that `*size` is the same as `(s.$bytes_func().to_vec().len()`?
If not, I suggest we ignore the error (but don't use the statistics) the
same way as is done for non UTF8 values i string statistics. Perhaps we can at
least debug when this happens with `debug!`
Since this data can come from any arbitrary parquet writer, I think it is
good practice to
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]