fallintoplace opened a new issue, #663:
URL: https://github.com/apache/iceberg-cpp/issues/663

   ### Problem
   
   `ManifestGroup::FilterFiles()` accepts a file-level expression and stores it 
in `file_filter_`, but `ReadEntries()` does not currently build or run an 
evaluator against each `DataFile`. As a result, non-true file filters are 
accepted but silently ignored.
   
   This is primarily a public API correctness issue. Normal table scans still 
apply data and partition filtering through the existing scan path, but direct 
use of `ManifestGroup::FilterFiles()` can return entries that should have been 
filtered out.
   
   ### Expected behavior
   
   `ManifestGroup::FilterFiles()` should either evaluate supported predicates 
against each `DataFile` metadata struct, including partition metadata for the 
manifest partition spec, or fail explicitly for unsupported predicates instead 
of behaving as a silent no-op.
   
   ### Reproduction idea
   
   Create a manifest with two entries that differ by `record_count`, call 
`FilterFiles(record_count >= 10)`, and observe that entries below the threshold 
are still returned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to