rdblue commented on code in PR #16263:
URL: https://github.com/apache/iceberg/pull/16263#discussion_r3221881982


##########
core/src/main/java/org/apache/iceberg/ManifestReader.java:
##########
@@ -417,14 +417,9 @@ public ManifestEntry<F> apply(ManifestEntry<F> entry) {
         }
       };
     } else {
-      // data file's first_row_id is null when the manifest's first_row_id is 
null
-      return entry -> {
-        if (entry.file() instanceof BaseFile) {
-          ((BaseFile<?>) entry.file()).setFirstRowId(null);
-        }
-
-        return entry;
-      };
+      // Preserve the source entry’s first row ID even if the manifest hasn’t 
assigned one since it
+      // may be EXISTING
+      return Function.identity();

Review Comment:
   Okay, from going through the repro test, I think this is a legitimate case 
because manifest compaction is using a `ManifestFile` to read that doesn't have 
an assigned `first_row_id`. That means the assumption here is violated because 
we can have manifests that don't have an assigned `first_row_id` and we should 
still read and pass through the `first_row_id` from files.
   
   I think the fix needs to distinguish between these cases. We need to have an 
`idAssigner` for committed manifests (this one) and an `idAssigner` used for 
uncommitted manifests that preserves the `first_row_id` like this does. We 
can't commit this fix because it breaks the defensive assignment we added for 
the v2/v1 snapshot case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to