rdblue commented on code in PR #16689:
URL: https://github.com/apache/iceberg/pull/16689#discussion_r3364092997


##########
core/src/main/java/org/apache/iceberg/Tracking.java:
##########
@@ -28,13 +28,13 @@ interface Tracking {
           0,
           "status",
           Types.IntegerType.get(),
-          "Entry status: 0=existing, 1=added, 2=deleted, 3=replaced");
+          "Entry status: 0=existing, 1=added, 2=deleted, 3=replaced, 
4=modified");
   Types.NestedField SNAPSHOT_ID =
       Types.NestedField.optional(
           1,
           "snapshot_id",
           Types.LongType.get(),
-          "Snapshot ID where the file was added or deleted");
+          "Snapshot ID where the file was added, deleted, replaced, or 
modified");

Review Comment:
   Have we agreed to modify the snapshot ID for a replaced entry? I thought 
that we were not going to change replaced entries.
   
   We change the snapshot ID for deleted entries, but not for existing entries 
so there's precedent both ways. If you're scanning for changes, the snapshot ID 
is useful for filtering out changes that are left-over from older snapshots. 
For instance, I may rewrite a manifest and delete a file in it. If I'm later 
scanning that file for changes, I would be able to check whether the delete 
entry is for the snapshot ID I'm getting changes for.
   
   The counter-argument is that the manifest would probably only be scanned for 
changes if you're looking for changes that would match. In order to scan that 
manifest, you'd first check its snapshot ID (when it was added) and not scan 
otherwise.
   
   Overall, I think the right thing is to update the snapshot ID as you have 
here. That way if any implementation reads files it doesn't need to, it has 
enough information to filter out the entries.
   
   Good to note in the spec @stevenzwu and @amogh-jahagirdar.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to