rdsr commented on a change in pull request #350: Add dropTable purge option to Catalog API URL: https://github.com/apache/incubator-iceberg/pull/350#discussion_r311763069
########## File path: core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java ########## @@ -119,4 +133,81 @@ private Table loadMetadataTable(TableIdentifier identifier, TableType type) { protected abstract TableOperations newTableOps(TableIdentifier tableIdentifier); protected abstract String defaultWarehouseLocation(TableIdentifier tableIdentifier); + + /** + * Drops all data and metadata files referenced by TableMetadata. + * <p> + * This should be called by dropTable implementations to clean up table files once the table has been dropped in the + * metastore. + * + * @param io a FileIO to use for deletes + * @param metadata the last valid TableMetadata instance for a dropped table. + */ + protected static void dropTableData(FileIO io, TableMetadata metadata) { + // Reads and deletes are done using Tasks.foreach(...).suppressFailureWhenFinished to complete + // as much of the delete work as possible and avoid orphaned data or manifest files. + + Set<String> manifestListsToDelete = Sets.newHashSet(); + Set<ManifestFile> manifestsToDelete = Sets.newHashSet(); + for (Snapshot snapshot : metadata.snapshots()) { + manifestsToDelete.addAll(snapshot.manifests()); + // add the manifest list to the delete set, if present + if (snapshot.manifestListLocation() != null) { + manifestListsToDelete.add(snapshot.manifestListLocation()); + } + } + + LOG.info("Manifests to delete: {}", Joiner.on(", ").join(manifestsToDelete)); + + // run all of the deletes + + deleteFiles(io, manifestsToDelete); + + Tasks.foreach(Iterables.transform(manifestsToDelete, ManifestFile::path)) + .noRetry().suppressFailureWhenFinished() Review comment: If we do parallelize, maybe there's scope of reusing `deleteFiles` method with a little more parameterization? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org