rdblue commented on a change in pull request #1080:
URL: https://github.com/apache/iceberg/pull/1080#discussion_r434004579
##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java
##########
@@ -174,13 +173,13 @@ private ManifestFile copyManifest(ManifestFile manifest) {
validateFilesCounts();
- // TODO: add sequence numbers here
Iterable<ManifestFile> newManifestsWithMetadata = Iterables.transform(
Iterables.concat(newManifests, addedManifests,
rewrittenAddedManifests),
manifest ->
GenericManifestFile.copyOf(manifest).withSnapshotId(snapshotId()).build());
// put new manifests at the beginning
- List<ManifestFile> apply = new ArrayList<>();
+ List<ManifestFile> apply = Lists.newArrayList();
+ apply.addAll(base.currentSnapshot().deleteManifests());
Review comment:
We should probably update the comment to include delete handling. We put
new manifests at the front of the list because those are the ones most likely
to have data for a query when writes align with reads (recent hours are read
more often than data that's months old).
I guess we don't really need the delete manifests at the start of the list.
We could put those at the end since they get split out into a separate list.
The one that matters is scanning the recent manifests first when planning jobs
to get data faster in engines like Presto that run the query and planning
concurrently.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]