rdblue commented on a change in pull request #2865:
URL: https://github.com/apache/iceberg/pull/2865#discussion_r676778611
##########
File path: core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java
##########
@@ -309,6 +327,31 @@ protected void validateDataFilesExist(TableMetadata base,
Long startingSnapshotI
VALIDATE_DATA_FILES_EXIST_SKIP_DELETE_OPERATIONS :
VALIDATE_DATA_FILES_EXIST_OPERATIONS;
+ Pair<List<ManifestFile>, Set<Long>> history =
+ validationHistory(base, startingSnapshotId, matchingOperations,
ManifestContent.DATA);
+ List<ManifestFile> manifests = history.first();
+ Set<Long> newSnapshots = history.second();
+
+ ManifestGroup matchingDeletesGroup = new ManifestGroup(ops.io(),
manifests, ImmutableList.of())
+ .filterManifestEntries(entry -> entry.status() !=
ManifestEntry.Status.ADDED &&
+ newSnapshots.contains(entry.snapshotId()) &&
requiredDataFiles.contains(entry.file().path()))
+ .specsById(base.specsById())
+ .ignoreExisting();
+
+ try (CloseableIterator<ManifestEntry<DataFile>> deletes =
matchingDeletesGroup.entries().iterator()) {
+ if (deletes.hasNext()) {
+ throw new ValidationException("Cannot commit, missing data files: %s",
Review comment:
This is an existing error message. We can fix it, but I'd prefer to do
it in a separate commit so that we don't have changes to more test cases than
necessary. I also plan to fix the initial validation in `RewriteFiles` later
for the same reason. Is it okay to leave this as-is?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]