lyne7-sc opened a new issue, #22994:
URL: https://github.com/apache/datafusion/issues/22994

   ### Describe the bug
   
   Parquet bloom filter pruning can return incorrect results for decimal 
columns encoded as `FIXED_LEN_BYTE_ARRAY`.
   
   When a decimal column is encoded as `FIXED_LEN_BYTE_ARRAY`, the bloom filter 
is built from the physical Parquet bytes. DataFusion currently checks the bloom 
filter using a fixed-width integer byte representation, which may not match the 
fixed byte length used in the Parquet file.
   
   This can cause false negatives in bloom filter pruning and incorrectly skip 
row groups that contain matching rows.
   
   ### To Reproduce
   
   ```sql
   COPY (
     SELECT CAST(column1 AS DECIMAL(19,2)) AS decimal_col
     FROM (VALUES (1), (2), (3), (4), (5), (6))
   ) TO '/tmp/df_decimal_bloom_repro'
   STORED AS PARQUET
   OPTIONS (
     'format.max_row_group_size' '2',
     'format.bloom_filter_on_write' 'true',
     'format.statistics_enabled' 'none'
   );
   
   SELECT COUNT(*) AS cnt
   FROM '/tmp/df_decimal_bloom_repro'
   WHERE decimal_col = CAST(5 AS DECIMAL(19,2));
   
   SET datafusion.execution.parquet.bloom_filter_on_read = false;
   
   SELECT COUNT(*) AS cnt
   FROM '/tmp/df_decimal_bloom_repro'
   WHERE decimal_col = CAST(5 AS DECIMAL(19,2));
   ```
   
     The first query returns 0, while the second query returns 1.
   
   ### Expected behavior
   
   Both queries should return 1. Bloom filter pruning should not remove a row 
group that contains the matching decimal value.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to