felixschneider99 opened a new pull request, #15614:
URL: https://github.com/apache/iceberg/pull/15614

   Closes #11023
   Fixes #14743
   Continues work from #11317 (originally authored by @twuebi)
   
   ## Background
   
   Spark's `DROP TABLE ... PURGE` implementation uses a client-side approach:
   1. List and delete all files using Spark/FileIO
   2. Send a plain `DROP TABLE` request to the catalog (purge flag always 
`false`)
   
   This means the `PURGE` keyword in Spark SQL has **no effect** on REST 
catalogs —
   the purge request is never forwarded (see #14743). This breaks REST catalogs 
that
   require `purgeRequested=true` (e.g. AWS S3 Tables) and prevents implementing
   UNDROP/soft-delete features.
   
   As discussed in the Apache Iceberg Catalog Community Sync (Jan 2025, notes 
from
   @RussellSpitzer), the community agreed on a path forward:
   1. Add an opt-in flag to delegate purge to the REST catalog (this PR)
   2. Deprecate client-side purge for Spark + REST catalogs
   3. In Iceberg 2.0, make catalog-delegated purge the default
   
   ## Changes
   
   - **`CatalogProperties`**: Added `rest.catalog-purge` catalog property 
(`REST_CATALOG_PURGE`, default `false`)
   - **`CachingCatalog`**: Added `wrapped_is_instance()` for catalog type 
detection
   - **`SparkCatalog`**: Enhanced `purgeTable()` to detect REST catalogs and 
delegate the purge flag when `rest.catalog-purge=true`; added deprecation 
warning for client-side purge with REST catalogs
   - **`TestRestDropPurgeTable`**: New test validating purge delegation to a 
REST catalog
   
   ## Behavior
   
   When `rest.catalog-purge=true` is set in catalog properties:
   - Spark skips client-side file deletion
   - Sends `DROP TABLE` with `purgeRequested=true` to the REST catalog
   - The catalog is responsible for the actual file removal (enabling 
UNDROP/soft-delete)
   
   All non-REST catalog implementations are **unaffected**.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to