SteveStevenpoor opened a new issue, #16670:
URL: https://github.com/apache/iceberg/issues/16670

   ### Feature Request / Improvement
   
   #  Motivation
   Iceberg has no SQL-level way to drop an existing partition from a table. 
`ALTER TABLE … DROP PARTITION` is a standard Hive/Spark/Trino DDL statement 
that users coming from those ecosystems expect to work, and not having it 
forces users to translate partition values back into predicates on source 
columns.
   
   ##  How users drop a partition today
   The only SQL option is a row-targeting DELETE keyed on the source column:
   ```sql
   -- "drop" January 2024 from a table partitioned by months(ts)
   DELETE FROM events
   WHERE ts >= TIMESTAMP '2024-01-01' AND ts <  TIMESTAMP '2024-02-01';
   ```
     Problems:
     - The table is partitioned by months(ts) (partition field ts_month), but 
the user has no SQL way to address that partition directly. So, they have to 
translate "January 2024" back into a half-open range on the source column.
   
   #  Proposed feature
   Add native support for the standard ALTER TABLE … DROP PARTITION syntax, 
addressing the partition field name (not the source column), in the same 
human-readable form Iceberg already writes to manifest partition paths:
   ```sql
   ALTER TABLE events DROP PARTITION (ts_year  = '2024');
   ALTER TABLE events DROP PARTITION (ts_month = '2024-01');
   ALTER TABLE events DROP PARTITION (ts_day   = '2024-02-15');
   ALTER TABLE events DROP PARTITION (ts_hour  = '2024-01-15-11');
   ALTER TABLE events DROP PARTITION (id_bucket = 0);
   ALTER TABLE events DROP PARTITION (region = 'eu', dt = '2024-01-01');
   ALTER TABLE events DROP PARTITION IF EXISTS (data = 'missing');
   
   -- Atomic multi-partition drop (SupportsAtomicPartitionManagement)
   ALTER TABLE events DROP PARTITION (data = 'b'), PARTITION (data = 'd');
   ```
   
   # Approach comparison
   
    | Concern | DELETE FROM … WHERE … (today) | ALTER TABLE … DROP PARTITION 
(proposed) |
   
|----------|------------------------------|------------------------------------------|
   | SQL surface | Predicates | Standard Spark DDL |
   | Naming the partition | Source-column range arithmetic | Direct 
partition-field name (`ts_month`, `id_bucket`, …) |
   | Bucket / truncate partitions | Cannot express cleanly in SQL | `id_bucket 
= 0` names the partition directly |
   | Time-transform partitions | User computes a `[lo, hi)` range each time | 
Same `"yyyy-MM"` / `"yyyy-MM-dd"` / `"yyyy-MM-dd-HH"` form Iceberg writes to 
manifests |
   | Migration from Spark tables | Pipelines using `DROP PARTITION` need 
rewriting | Drop-in source compatible |
   
   #  Implementation sketch
   - SparkTable now implements `SupportsAtomicPartitionManagement`. 
`partitionSchema(...)` exposes the partition fields (ts_day, id_bucket, …) with 
their human-readable types: Atring for year/month/day/hour, integer for bucket, 
source type for identity/truncate.
   - `dropPartition(InternalRow)` builds a transform-aware predicate (e.g. 
`Expressions.equal(Expressions.day("ts"), 
Transforms.parseHumanDay("2024-02-15"))`) and commits it via 
`DeleteFiles.deleteFromRowFilter(...)` a manifest-level operation.
   - new `Transforms.parseHumanYear/Month/Day/Hour(String)` → int static 
methods on org.apache.iceberg.transforms.Transforms, paired with the existing 
`TransformUtil.humanYear/Month/Day/Hour` forward conversions. Centralizes the 
human-string ↔ int-ordinal logic.
   
   # Safety
   `deleteFromRowFilter(...)` already runs the strict-projection check at 
commit, so partial-file deletes are rejected by Iceberg core; the new API 
surface cannot silently remove rows outside the targeted partition.
   
   
   ### Query engine
   
   Spark
   
   ### Willingness to contribute
   
   - [x] I can contribute this improvement/feature independently
   - [ ] I would be willing to contribute this improvement/feature with 
guidance from the Iceberg community
   - [ ] I cannot contribute this improvement/feature at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to