zhongyujiang opened a new issue, #6718: URL: https://github.com/apache/paimon/issues/6718
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version master ### Compute Engine spark ### Minimal reproduce step ```java @Test public void testFirstRow() { spark.sql( "CREATE TABLE T (a INT NOT NULL, b INT) TBLPROPERTIES" + " ('file.format'='avro', 'merge-engine'='first-row', 'primary-key'='a', 'bucket'='1')"); spark.sql("insert into T values(1, 1), (2, 2)"); spark.sql("DELETE FROM T WHERE a = 1"); spark.sql("select * from T").show(false); // empty results } ``` ### What doesn't meet your expectations? Currently, Paimon first-row merge engine doesn't support delete and update by definition: [reference](https://paimon.apache.org/docs/master/primary-key-table/merge-engine/first-row/), and `FirstRowMergeFunction` would throw exceptions while retract rows were added. However, Spark is still able to delete records from a first-row merge-engine table through the DELETE command, this is conflicting. And Spark performs a CoW to rewrite data when deleting records, the newly added files end up at the L0 level. This causes the undeleted records to become unqueryable unless a compaction is performed to merge them. I think we should forbid Spark from performing DELETE / UPDATE / MERGE to delete or update records on first-row tables, just like we do in `FirstRowMergeFunction`. cc @Zouxxyy @JingsongLi Can you take a look when you have time? Thanks! ### Anything else? _No response_ ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
