rdblue commented on a change in pull request #928: Add failure handling in
cleanup while reading snapshot manifest-list files
URL: https://github.com/apache/incubator-iceberg/pull/928#discussion_r409717732
##########
File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java
##########
@@ -201,112 +201,121 @@ private void cleanExpiredFiles(List<Snapshot>
snapshots, Set<Long> validIds, Set
// find manifests to clean up that are still referenced by a valid
snapshot, but written by an expired snapshot
Set<String> validManifests = Sets.newHashSet();
Set<ManifestFile> manifestsToScan = Sets.newHashSet();
- for (Snapshot snapshot : snapshots) {
- try (CloseableIterable<ManifestFile> manifests =
readManifestFiles(snapshot)) {
- for (ManifestFile manifest : manifests) {
- validManifests.add(manifest.path());
-
- long snapshotId = manifest.snapshotId();
- // whether the manifest was created by a valid snapshot (true) or an
expired snapshot (false)
- boolean fromValidSnapshots = validIds.contains(snapshotId);
- // whether the snapshot that created the manifest was an ancestor of
the table state
- boolean isFromAncestor = ancestorIds.contains(snapshotId);
- // whether the changes in this snapshot have been picked into the
current table state
- boolean isPicked = pickedAncestorSnapshotIds.contains(snapshotId);
- // if the snapshot that wrote this manifest is no longer valid (has
expired), then delete its deleted files.
- // note that this is only for expired snapshots that are in the
current table state
- if (!fromValidSnapshots && (isFromAncestor || isPicked) &&
manifest.hasDeletedFiles()) {
- manifestsToScan.add(manifest.copy());
- }
- }
-
- } catch (IOException e) {
- throw new RuntimeIOException(e,
- "Failed to close manifest list: %s",
snapshot.manifestListLocation());
- }
- }
+ Tasks.foreach(snapshots).noRetry().suppressFailureWhenFinished()
Review comment:
I think we should retry in `findFilesToDelete`. The most important part to
retry is locating the files that need to be deleted. Not reading a manifest
list could lead to hundreds of orphaned files.
I think the original rationale for not retrying when deleting a file was
that we expect more failures there from files that have already been deleted.
If a file is missing, then we shouldn't try to delete it 4 times... but other
than in our environment where we use bucket policy in some places I'm not sure
where that might happen. Also, the deletes for metadata files that have
probably been read should be retried.
So I think it's reasonable to add a retry for the delete methods, too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]