rymurr commented on a change in pull request #1080:
URL: https://github.com/apache/iceberg/pull/1080#discussion_r433760536



##########
File path: core/src/main/java/org/apache/iceberg/AllManifestsTable.java
##########
@@ -136,7 +136,7 @@ protected long targetSplitSize(TableOperations ops) {
         } else {
           return StaticDataTask.of(
               ops.io().newInputFile(ops.current().file().location()),
-              snap.manifests(),
+              snap.dataManifests(), // if the manifest list is missing, the 
table must be v1 and has no delete manifests

Review comment:
       I am not sure I understand the comment here or why `dataManifests` is 
used instead of `allManifests`

##########
File path: core/src/test/java/org/apache/iceberg/TableTestBase.java
##########
@@ -224,10 +224,10 @@ void validateSnapshot(Snapshot old, Snapshot snap, long 
sequenceNumber, DataFile
   }
 
   void validateSnapshot(Snapshot old, Snapshot snap, Long sequenceNumber, 
DataFile... newFiles) {
-    List<ManifestFile> oldManifests = old != null ? old.manifests() : 
ImmutableList.of();
+    List<ManifestFile> oldManifests = old != null ? old.dataManifests() : 
ImmutableList.of();

Review comment:
       This and two other tests use `dataManifests()` where all other tests 
have been changed to `allManifests()`. Is this intentional?
   
   Other tests:
   `TestRemoveSnapshots`
   `RewriteManifestsAction`

##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java
##########
@@ -174,13 +173,13 @@ private ManifestFile copyManifest(ManifestFile manifest) {
 
     validateFilesCounts();
 
-    // TODO: add sequence numbers here
     Iterable<ManifestFile> newManifestsWithMetadata = Iterables.transform(
         Iterables.concat(newManifests, addedManifests, 
rewrittenAddedManifests),
         manifest -> 
GenericManifestFile.copyOf(manifest).withSnapshotId(snapshotId()).build());
 
     // put new manifests at the beginning
-    List<ManifestFile> apply = new ArrayList<>();
+    List<ManifestFile> apply = Lists.newArrayList();
+    apply.addAll(base.currentSnapshot().deleteManifests());

Review comment:
       Here data manifests are (potentially) rewritten while keeping deletes 
the same. Again with deletes at the front. Do I understand correctly? The 
comment is a bit misleading as immediately below it old delete manifests are 
added to the list.

##########
File path: core/src/main/java/org/apache/iceberg/FastAppend.java
##########
@@ -122,6 +122,10 @@ private ManifestFile copyManifest(ManifestFile manifest) {
   public List<ManifestFile> apply(TableMetadata base) {
     List<ManifestFile> newManifests = Lists.newArrayList();
 
+    if (base.currentSnapshot() != null) {
+      newManifests.addAll(base.currentSnapshot().deleteManifests());
+    }

Review comment:
       The addition of delete files in `newManifests` far above the addition of 
data files threw me a bit. Is it intentional to ensure the delete files are at 
the front of the list?

##########
File path: core/src/main/java/org/apache/iceberg/ScanSummary.java
##########
@@ -159,7 +159,7 @@ private void removeTimeFilters(List<Expression> 
expressions, Expression expressi
       removeTimeFilters(filters, Expressions.rewriteNot(scan.filter()));
       Expression rowFilter = joinFilters(filters);
 
-      Iterable<ManifestFile> manifests = table.currentSnapshot().manifests();
+      Iterable<ManifestFile> manifests = 
table.currentSnapshot().dataManifests();

Review comment:
       Is this explicitly ignoring the effect of deleted rows on partition 
metrics or is it just that you are short circuiting any delete files (as we 
can't use them anyways)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to