felixschneider99 opened a new pull request, #15614: URL: https://github.com/apache/iceberg/pull/15614
Closes #11023 Fixes #14743 Continues work from #11317 (originally authored by @twuebi) ## Background Spark's `DROP TABLE ... PURGE` implementation uses a client-side approach: 1. List and delete all files using Spark/FileIO 2. Send a plain `DROP TABLE` request to the catalog (purge flag always `false`) This means the `PURGE` keyword in Spark SQL has **no effect** on REST catalogs — the purge request is never forwarded (see #14743). This breaks REST catalogs that require `purgeRequested=true` (e.g. AWS S3 Tables) and prevents implementing UNDROP/soft-delete features. As discussed in the Apache Iceberg Catalog Community Sync (Jan 2025, notes from @RussellSpitzer), the community agreed on a path forward: 1. Add an opt-in flag to delegate purge to the REST catalog (this PR) 2. Deprecate client-side purge for Spark + REST catalogs 3. In Iceberg 2.0, make catalog-delegated purge the default ## Changes - **`CatalogProperties`**: Added `rest.catalog-purge` catalog property (`REST_CATALOG_PURGE`, default `false`) - **`CachingCatalog`**: Added `wrapped_is_instance()` for catalog type detection - **`SparkCatalog`**: Enhanced `purgeTable()` to detect REST catalogs and delegate the purge flag when `rest.catalog-purge=true`; added deprecation warning for client-side purge with REST catalogs - **`TestRestDropPurgeTable`**: New test validating purge delegation to a REST catalog ## Behavior When `rest.catalog-purge=true` is set in catalog properties: - Spark skips client-side file deletion - Sends `DROP TABLE` with `purgeRequested=true` to the REST catalog - The catalog is responsible for the actual file removal (enabling UNDROP/soft-delete) All non-REST catalog implementations are **unaffected**. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
