Fokko opened a new issue, #22:
URL: https://github.com/apache/iceberg-rust/issues/22

   In (Py)Iceberg we have a hierarchy that works very well. In this issue, I'll 
try to explain it, and also convince y'all to use it in `iceberg-rust` as well. 
Disclaimer, I'm an OOP guy, so probably there are some things that don't make 
sense in Rust. I don't think we even can port the whole hierarchy, since Rust 
is not OOP.
   
   The important traits are (directly translated from Python):
   
   ```rust
   trait Bound {
       fn invert(&self) -> Bound;
   }
   ```
   
   ```rust
   trait Unbound {
       fn bind(&self, schema &impl Schema, case_sensitive &bool) -> Unbound;
   }
   ```
   
   (This excludes `Term`, `Reference`, `BooleanExpression`, maybe we should 
call `Unbound` as `UnboundBooleanExpression`, and `Bound` as 
`BoundBooleanExpression`, it is up to you. In the end, naming things is the 
hardest thing in computer science).
   
   This is implemented by operations such as, 
   
   - Unary predicates: `IsNull`, `NotNull`, `IsNaN`, `NotNaN`
   - Set predicates: `In`, `NotIn`
   - Literal predicates: `EqualTo`, `NotEqualTo`, `LessThan`, 
`LessThanOrEqual`, `GreaterThan`, `GreaterThanOrEqual` 
   - Negation: `Not`
   - Compositions: `And`, `Or`
   - Literal: `AlwaysTrue`, `AlwaysFalse`
   
   The inverse method is important later on to rewrite `Not(...)` operations. 
`Not(EqualTo("UserId", "123"))`, can be rewritten to `NotEqualTo("UserID", 
123)`. Similar for Not [can be rewritten: `!(A and B) == !A or 
!B`](https://github.com/apache/iceberg/blob/da6e611d5d19a08c915fdad51c9f1f147c4e1b91/python/pyiceberg/expressions/__init__.py#L266-L269).
   
   All the operations come in a bound and unbound one. The `EqualTo` is in the 
public API, and once it is bound to a schema, it will a `BoundEqualTo`. Binding 
is important since let's say that we have an expression: `UserID = '123'`, then 
we want to convert this at bind time to `UserID = 123` because the UserID is a 
date field in this case.
   Iceberg is lazy, so the UserID might have different compatible types, so if 
you promote a column along the way from `i32` to `i64`, the will UserID will 
also be promoted to an `i64` when binding to a file that has been written with 
the newer schema:
   
   - If you add a new column, but this column isn't written in an older file, 
then it will be converted to an `AlwaysFalse()`.
   - When a `IsNull("UserID")` is bound to a `UserID INTEGER NOT NULL` column, 
then this will also convert into a `AlwaysFalse()`.
   - Optional: In PyIceberg we have optimizations, that `In("UserID", {123})` 
is rewritten to `EqualTo("UserID" == 123)`, since there is only one literal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to