kosiew commented on code in PR #22995:
URL: https://github.com/apache/datafusion/pull/22995#discussion_r3434531839
##########
datafusion/datasource-parquet/src/bloom_filter.rs:
##########
@@ -39,7 +39,8 @@ pub(crate) struct BloomFilterStatistics {
/// Value:
/// * [`Sbbf`] (Bloom filter),
/// * Parquet physical [`Type`] needed to evaluate literals against the
filter
- column_sbbf: HashMap<String, (Sbbf, Type)>,
+ /// * Type length from the Parquet column descriptor
+ column_sbbf: HashMap<String, (Sbbf, Type, i32)>,
Review Comment:
This tuple now carries a pretty important `type_length` contract. A small
named struct, such as `struct ColumnBloomFilter { sbbf: Sbbf, physical_type:
Type, type_length: i32 }`, could make the invariant clearer and help avoid
accidentally mixing up tuple fields at call sites.
##########
datafusion/datasource-parquet/src/bloom_filter.rs:
##########
@@ -375,6 +409,40 @@ mod tests {
.await
}
+ #[tokio::test]
+ async fn test_row_group_bloom_filter_pruning_predicate_decimal128() {
Review Comment:
Nice regression coverage for the fixed-width truncation path. It might be
worth adding a negative decimal case as well, since Parquet fixed-len decimal
bytes depend on two's-complement sign extension and truncation. For example,
you could write row groups with negative values and assert that a predicate
like `decimal_col = -500` keeps only the matching row group.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]