Miha-Cancula-Flarion opened a new pull request, #23245:
URL: https://github.com/apache/datafusion/pull/23245

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Closes #.
   
   ## Rationale for this change
   
   When https://github.com/apache/parquet-java/ writes a bloom filter for a 
boolean column, it does not actually update the values, so the filter ends up 
empty. 
   
   The DataFusion reader then incorrectly assumes that such a file contains no 
values, and skips it while reading. 
   
   ## What changes are included in this PR?
   
   This change makes is so that we always assume that a boolean column has 
values, essentially ignoring the filter. 
   
   ## Are these changes tested?
   
   Not yet.
   
   ## Are there any user-facing changes?
   
   This may affect performance in cases where the SBBF was written correctly, 
and thus legitimately excludes some data files. With this change, those files 
will still be scanned. 
   
   There are no changes to the API. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to