laserninja commented on PR #16110:
URL: https://github.com/apache/iceberg/pull/16110#issuecomment-4418042377

   Ignored can be the right abstraction. The key semantic difference from 
AlwaysFalse is in or: or(x, AlwaysFalse) correctly simplifies to x, but or(x, 
Ignored) must stay Ignored because the missing column could have matching rows, 
so we can't push down the OR at all.
   
   Proposed semantics:
   
   not(Ignored) → Ignored
   or(x, Ignored) / or(Ignored, x) → Ignored (can't push; might miss rows)
   and(real, Ignored) / and(Ignored, real) → real (safe to push the resolvable 
side; AND with an ignored term can only be more restrictive)
   and(Ignored, Ignored) → Ignored
   convert(): treat Ignored.INSTANCE same as AlwaysTrue.INSTANCE → NOOP
   This gives "correct result in many cases" for AND-heavy filters on partially 
evolved files.
   
   I'll update the PR with:
   
   The Ignored sentinel class and the visitor changes above
   A TableScan-level integration test using the schema evolution scenario 
(write file without column, add column, scan with predicate on the new column)
   Updated unit tests in TestParquetFilters
   Does the and semantic look right to you, or would you prefer and(real, 
Ignored) → Ignored (simpler, NOOP the whole thing)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to