rymurr commented on a change in pull request #1080:
URL: https://github.com/apache/iceberg/pull/1080#discussion_r433760536
##########
File path: core/src/main/java/org/apache/iceberg/AllManifestsTable.java
##########
@@ -136,7 +136,7 @@ protected long targetSplitSize(TableOperations ops) {
} else {
return StaticDataTask.of(
ops.io().newInputFile(ops.current().file().location()),
- snap.manifests(),
+ snap.dataManifests(), // if the manifest list is missing, the
table must be v1 and has no delete manifests
Review comment:
I am not sure I understand the comment here or why `dataManifests` is
used instead of `allManifests`
##########
File path: core/src/test/java/org/apache/iceberg/TableTestBase.java
##########
@@ -224,10 +224,10 @@ void validateSnapshot(Snapshot old, Snapshot snap, long
sequenceNumber, DataFile
}
void validateSnapshot(Snapshot old, Snapshot snap, Long sequenceNumber,
DataFile... newFiles) {
- List<ManifestFile> oldManifests = old != null ? old.manifests() :
ImmutableList.of();
+ List<ManifestFile> oldManifests = old != null ? old.dataManifests() :
ImmutableList.of();
Review comment:
This and two other tests use `dataManifests()` where all other tests
have been changed to `allManifests()`. Is this intentional?
Other tests:
`TestRemoveSnapshots`
`RewriteManifestsAction`
##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java
##########
@@ -174,13 +173,13 @@ private ManifestFile copyManifest(ManifestFile manifest) {
validateFilesCounts();
- // TODO: add sequence numbers here
Iterable<ManifestFile> newManifestsWithMetadata = Iterables.transform(
Iterables.concat(newManifests, addedManifests,
rewrittenAddedManifests),
manifest ->
GenericManifestFile.copyOf(manifest).withSnapshotId(snapshotId()).build());
// put new manifests at the beginning
- List<ManifestFile> apply = new ArrayList<>();
+ List<ManifestFile> apply = Lists.newArrayList();
+ apply.addAll(base.currentSnapshot().deleteManifests());
Review comment:
Here data manifests are (potentially) rewritten while keeping deletes
the same. Again with deletes at the front. Do I understand correctly? The
comment is a bit misleading as immediately below it old delete manifests are
added to the list.
##########
File path: core/src/main/java/org/apache/iceberg/FastAppend.java
##########
@@ -122,6 +122,10 @@ private ManifestFile copyManifest(ManifestFile manifest) {
public List<ManifestFile> apply(TableMetadata base) {
List<ManifestFile> newManifests = Lists.newArrayList();
+ if (base.currentSnapshot() != null) {
+ newManifests.addAll(base.currentSnapshot().deleteManifests());
+ }
Review comment:
The addition of delete files in `newManifests` far above the addition of
data files threw me a bit. Is it intentional to ensure the delete files are at
the front of the list?
##########
File path: core/src/main/java/org/apache/iceberg/ScanSummary.java
##########
@@ -159,7 +159,7 @@ private void removeTimeFilters(List<Expression>
expressions, Expression expressi
removeTimeFilters(filters, Expressions.rewriteNot(scan.filter()));
Expression rowFilter = joinFilters(filters);
- Iterable<ManifestFile> manifests = table.currentSnapshot().manifests();
+ Iterable<ManifestFile> manifests =
table.currentSnapshot().dataManifests();
Review comment:
Is this explicitly ignoring the effect of deleted rows on partition
metrics or is it just that you are short circuiting any delete files (as we
can't use them anyways)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]