rdblue commented on code in PR #16092:
URL: https://github.com/apache/iceberg/pull/16092#discussion_r3244853485
##########
core/src/main/java/org/apache/iceberg/TrackingStruct.java:
##########
@@ -246,4 +266,78 @@ public String toString() {
.add("replaced_positions", replacedPositions == null ? "null" :
"(binary)")
.toString();
}
+
+ static class Builder {
+ private EntryStatus status = null;
+ private Long snapshotId = null;
+ private Long dataSequenceNumber = null;
+ private Long fileSequenceNumber = null;
+ private Long dvSnapshotId = null;
+ private Long firstRowId = null;
+ private byte[] deletedPositions = null;
+ private byte[] replacedPositions = null;
+
+ Builder status(EntryStatus entryStatus) {
Review Comment:
Entry status has a specific lifecycle: when files are written, they are
`ADDED`. When read and written again, they move to `EXISTING`. And when either
an `ADDED` or `EXISTING` entry is deleted, it is set to `DELETED`. Also, the
`snapshotId` field is related: it is required for `ADDED`, must not be modified
for `EXISTING`, and again must be supplied for `DELETED`. (I think `REPLACED`
will have similar behavior and it is easier to think about the API by omitting
it for now.)
I think that it makes more sense to have the builder handle these cases than
to allow creating possibly misconfigured `Tracking` structs. I would handle
these requirements in a way that match the lifecycle:
```java
Builder added(long snapshotId); // creates a new Tracking for an addition
Builder existing(Tracking); // creates Tracking for an existing file based
on Tracking read from a manifest
Builder deleted(Tracking, long snapshotId); // creates Tracking to delete
a file
```
Passing the existing tracking in makes it so the builder can keep values
that must be preserved. And this approach allows the builder to use the initial
intent (added/existing/deleted) to perform better validation. For instance, you
can't call `dvSnapshotId` for a deleted entry -- it has to come from the
existing/added entry and can't be changed.
I should also note that a builder may not be the right way to create
`Tracking` instances. It could be that we have a builder for `ADDED`, but then
use instance methods to change from one state to the next, like
`toExisting(...)`. I think going through the thought exercise of restricting
what you can do through the builder is the right path forward. If it doesn't
really work then we can consider other approaches.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]