[I] [Bug] Spark DELETE FROM should not support first-row engine table [paimon]

via GitHub Mon, 01 Dec 2025 01:59:22 -0800


zhongyujiang opened a new issue, #6718:
URL: https://github.com/apache/paimon/issues/6718


   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   master
   
   ### Compute Engine
   
   spark
   
   ### Minimal reproduce step
   
   ```java
       @Test
       public void testFirstRow() {
           spark.sql(
                   "CREATE TABLE T (a INT NOT NULL, b INT) TBLPROPERTIES"
                           + " ('file.format'='avro', 
'merge-engine'='first-row', 'primary-key'='a', 'bucket'='1')");
           spark.sql("insert into T values(1, 1), (2, 2)");
           spark.sql("DELETE FROM T WHERE a = 1");
           spark.sql("select * from T").show(false); // empty results
       }
   ```
   
   ### What doesn't meet your expectations?
   
   Currently, Paimon first-row merge engine doesn't support delete and update 
by definition: 
[reference](https://paimon.apache.org/docs/master/primary-key-table/merge-engine/first-row/),
 and `FirstRowMergeFunction` would throw exceptions while retract rows were 
added. 
   
   However, Spark is still able to delete records from a first-row merge-engine 
table through the DELETE command, this is conflicting. 
   
   And Spark performs a CoW to rewrite data when deleting records, the newly 
added files end up at the L0 level. This causes the undeleted records to become 
unqueryable unless a compaction is performed to merge them.
   
   
   I think we should forbid Spark from performing DELETE / UPDATE / MERGE to 
delete or update records on first-row tables, just like we do in 
`FirstRowMergeFunction`.
   
   cc @Zouxxyy @JingsongLi Can you take a look when you have time? Thanks!
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Bug] Spark DELETE FROM should not support first-row engine table [paimon]

Reply via email to