Fokko opened a new issue, #22:
URL: https://github.com/apache/iceberg-rust/issues/22
In (Py)Iceberg we have a hierarchy that works very well. In this issue, I'll
try to explain it, and also convince y'all to use it in `iceberg-rust` as well.
Disclaimer, I'm an OOP guy, so probably there are some things that don't make
sense in Rust. I don't think we even can port the whole hierarchy, since Rust
is not OOP.
The important traits are (directly translated from Python):
```rust
trait Bound {
fn invert(&self) -> Bound;
}
```
```rust
trait Unbound {
fn bind(&self, schema &impl Schema, case_sensitive &bool) -> Unbound;
}
```
(This excludes `Term`, `Reference`, `BooleanExpression`, maybe we should
call `Unbound` as `UnboundBooleanExpression`, and `Bound` as
`BoundBooleanExpression`, it is up to you. In the end, naming things is the
hardest thing in computer science).
This is implemented by operations such as,
- Unary predicates: `IsNull`, `NotNull`, `IsNaN`, `NotNaN`
- Set predicates: `In`, `NotIn`
- Literal predicates: `EqualTo`, `NotEqualTo`, `LessThan`,
`LessThanOrEqual`, `GreaterThan`, `GreaterThanOrEqual`
- Negation: `Not`
- Compositions: `And`, `Or`
- Literal: `AlwaysTrue`, `AlwaysFalse`
The inverse method is important later on to rewrite `Not(...)` operations.
`Not(EqualTo("UserId", "123"))`, can be rewritten to `NotEqualTo("UserID",
123)`. Similar for Not [can be rewritten: `!(A and B) == !A or
!B`](https://github.com/apache/iceberg/blob/da6e611d5d19a08c915fdad51c9f1f147c4e1b91/python/pyiceberg/expressions/__init__.py#L266-L269).
All the operations come in a bound and unbound one. The `EqualTo` is in the
public API, and once it is bound to a schema, it will a `BoundEqualTo`. Binding
is important since let's say that we have an expression: `UserID = '123'`, then
we want to convert this at bind time to `UserID = 123` because the UserID is a
date field in this case.
Iceberg is lazy, so the UserID might have different compatible types, so if
you promote a column along the way from `i32` to `i64`, the will UserID will
also be promoted to an `i64` when binding to a file that has been written with
the newer schema:
- If you add a new column, but this column isn't written in an older file,
then it will be converted to an `AlwaysFalse()`.
- When a `IsNull("UserID")` is bound to a `UserID INTEGER NOT NULL` column,
then this will also convert into a `AlwaysFalse()`.
- Optional: In PyIceberg we have optimizations, that `In("UserID", {123})`
is rewritten to `EqualTo("UserID" == 123)`, since there is only one literal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]