Re: [PR] feat(datafusion): add TRUNCATE TABLE and DROP PARTITION SQL support [paimon-rust]

via GitHub Mon, 27 Apr 2026 22:20:42 -0700


jerry-024 commented on code in PR #292:
URL: https://github.com/apache/paimon-rust/pull/292#discussion_r3151738639



##########
crates/integrations/datafusion/src/sql_handler.rs:
##########
@@ -973,6 +1057,60 @@ fn extract_options(opts: &CreateTableOptions) -> 
DFResult<Vec<(String, String)>>
         .collect()
 }
 
+/// Parse partition expressions (`col = val, ...`) into partition value maps
+/// suitable for `TableCommit::truncate_partitions`.
+///
+/// All expressions are treated as belonging to a single partition 
specification.
+/// For multiple partitions, callers should invoke this once per partition 
clause.
+fn parse_partition_values(
+    exprs: &[SqlExpr],
+    all_fields: &[PaimonDataField],
+    partition_keys: &[String],
+) -> DFResult<Vec<HashMap<String, Option<Datum>>>> {
+    let field_map: HashMap<&str, &PaimonDataField> =
+        all_fields.iter().map(|f| (f.name(), f)).collect();
+
+    let mut partition = HashMap::new();
+    for expr in exprs {
+        let (col_name, val_expr) = match expr {
+            SqlExpr::BinaryOp {
+                left,
+                op: datafusion::sql::sqlparser::ast::BinaryOperator::Eq,
+                right,
+            } => {
+                let col = match left.as_ref() {
+                    SqlExpr::Identifier(ident) => ident.value.clone(),
+                    other => {
+                        return Err(DataFusionError::Plan(format!(
+                            "Expected column name in partition spec, got: 
{other}"
+                        )))
+                    }
+                };
+                (col, right.as_ref())
+            }
+            other => {
+                return Err(DataFusionError::Plan(format!(
+                    "Expected 'column = value' in partition spec, got: {other}"
+                )))
+            }
+        };
+
+        if !partition_keys.iter().any(|k| k == &col_name) {
+            return Err(DataFusionError::Plan(format!(
+                "Column '{col_name}' is not a partition column"
+            )));
+        }
+
+        let field = field_map.get(col_name.as_str()).ok_or_else(|| {
+            DataFusionError::Plan(format!("Column '{col_name}' not found in 
table schema"))

Review Comment:
   **HIGH: silent wrong-partition deletion on incomplete specs.**
   
   This builds `partition` only from keys the user supplied. The map flows 
through `TableCommit::truncate_partitions` → `partitions_to_bytes`:
   
   ```rust
   let datum = p.get(key).cloned().flatten();  // missing key → None
   ```
   
   `None` is encoded into the partition filter as `IS NULL`. So on a `(dt, 
region)` partitioned table, `DROP PARTITION (dt = '2026-04-28')` becomes a 
filter `dt='2026-04-28' AND region IS NULL` — it does **not** delete all 
`dt='2026-04-28'` partitions like Hive/Spark prefix semantics would. If a 
NULL-region partition exists, it gets deleted (silent wrong-target). If not, 
silent no-op. Either way, user intent is mishandled with no error.
   
   Two acceptable fixes:
   - **Reject incomplete specs explicitly:** after the loop, `if 
partition.len() != partition_keys.len()` → `Err` listing the missing keys.
   - **Implement true prefix-match deletion:** requires a partition-filter 
primitive that matches on a subset of keys; not just a downstream-API tweak.
   
   For reference, Java's Spark 
[`PaimonPartitionManagement.toPaimonPartitions`](https://github.com/apache/paimon/blob/master/paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/PaimonPartitionManagement.scala)
 does `require(partitionFieldCount <= partitionKeys.length)` and projects the 
partition row type so prefix specs work as Hive expects.



##########
crates/integrations/datafusion/src/sql_handler.rs:
##########
@@ -556,6 +570,76 @@ impl PaimonSqlHandler {
         crate::merge_into::ok_result(&self.ctx, row_count)
     }
 
+    async fn handle_truncate_table(&self, truncate: &Truncate) -> 
DFResult<DataFrame> {
+        if truncate.table_names.len() > 1 {
+            return Err(DataFusionError::Plan(
+                "TRUNCATE TABLE does not support multiple tables".to_string(),
+            ));
+        }
+        let target = truncate.table_names.first().ok_or_else(|| {
+            DataFusionError::Plan("TRUNCATE TABLE requires a table 
name".to_string())
+        })?;
+        let identifier = self.resolve_table_name(&target.name)?;
+        let table = self
+            .catalog
+            .get_table(&identifier)
+            .await
+            .map_err(to_datafusion_error)?;
+
+        let wb = table.new_write_builder();
+        let commit = wb.new_commit();

Review Comment:
   **MEDIUM: empty `PARTITION ()` clause falls through to a full-table 
truncate.**
   
   If `truncate.partitions` is `Some(empty_vec)` (whatever AST shape sqlparser 
produces for a degenerate parser edge case), this branch is skipped and 
execution drops into `commit.truncate_table()` below — the entire table is 
wiped despite the user writing a `PARTITION` clause. Opposite of intent.
   
   Suggest tightening:
   
   ```rust
   if let Some(partitions) = &truncate.partitions {
       if partitions.is_empty() {
           return Err(DataFusionError::Plan(
               "PARTITION clause requires at least one column = 
value".to_string(),
           ));
       }
       // ... existing branch
       return ok_result(...);
   }
   commit.truncate_table().await?;
   ```
   
   This way only `TRUNCATE TABLE t` (no PARTITION clause at all, i.e. 
`truncate.partitions` is `None`) reaches `truncate_table()`.



##########
crates/integrations/datafusion/src/sql_handler.rs:
##########
@@ -326,6 +330,16 @@ impl PaimonSqlHandler {
                         }
                     }
                 }
+                // DropPartitions is a data operation (not a schema change), 
so we handle it
+                // separately and return early — it cannot be combined with 
schema changes.

Review Comment:
   **MEDIUM: `if_exists` dropped at three levels — none honored.**
   
   1. **Inner `AlterTableOperation::DropPartitions { if_exists: _ }`** (this 
match arm): partition-level IF EXISTS is explicitly bound and discarded. The 
doc comment claims this "matches IF EXISTS semantics" because the underlying 
overwrite is a no-op on missing partitions (verified — `truncate_partitions` 
early-returns on empty resolved entries). But that makes plain `DROP PARTITION` 
behave identically to `DROP PARTITION IF EXISTS`. Spark's 
`AlterTableDropPartitionCommand` errors when IF EXISTS is omitted and the 
partition doesn't exist — this PR diverges silently.
   
   2. **Outer `Statement::AlterTable { if_exists, .. }`**: returning early from 
this branch into `handle_drop_partitions` skips whatever outer IF EXISTS 
handling the rest of `handle_alter_table` does. `ALTER TABLE IF EXISTS 
missing_table DROP PARTITION (...)` will fail with a hard error from 
`catalog.get_table(...)` instead of being a silent no-op.
   
   3. **`Statement::Truncate { if_exists, .. }`** (handler at line 570): 
sqlparser parses the flag (confirmed in `sqlparser::ast::Truncate`), but 
`handle_truncate_table` never reads it. `TRUNCATE TABLE IF EXISTS 
missing_table` errors instead of no-oping.
   
   Suggested fix: at each layer, read the flag, and on `get_table` returning 
NotFound (or by `list_tables` first), short-circuit to `ok_result` when 
`if_exists` is set, otherwise propagate the error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(datafusion): add TRUNCATE TABLE and DROP PARTITION SQL support [paimon-rust]

Reply via email to