sandyfog opened a new issue, #6862: URL: https://github.com/apache/paimon/issues/6862
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Paimon version 1.2.0 ### Compute Engine flink 1.20 ### Minimal reproduce step ## Create the source changelog table ``` CREATE TABLE testlog( id STRING PRIMARY KEY NOT ENFORCED, f1 STRING, delete INT ) WITH ( 'merge-engine' = 'deduplicate', 'changelog-producer' = 'lookup' ); -- seed data INSERT INTO testlog VALUES ('11', '11', 0), ('12', '12', 0); -- update id=12 to be logically deleted INSERT INTO testlog VALUES ('12', '12', 1), ('13', '13', 0); ``` ## Query source table ``` SELECT * FROM testlog; op id f1 delete +I 11 11 0 +I 12 12 0 -U 12 12 0 +U 01 12 1 +I 13 13 0 ``` ## Query source table with filter ``` SELECT * FROM testlog WHERE delete = 0; op id f1 delete +I 11 11 0 +I 12 12 0 -U 12 12 0 +I 13 13 0 ``` Note: After applying the filter delete = 0, the +U 12 12 1 is completely dropped. Consequently, the only message about id = 12 that reaches the downstream Paimon table is -U 12 12 0. ## Filter out logically-deleted rows and write into a partial-update table ``` CREATE TABLE testlog01 ( id STRING PRIMARY KEY NOT ENFORCED, f1 STRING, delete INT ) WITH ( 'merge-engine' = 'partial-update', 'partial-update.remove-record-on-delete' = 'true', 'changelog-producer' = 'lookup' ); INSERT INTO testlog01 SELECT * FROM testlog WHERE delete = 0; ``` ## Query the target table ``` SELECT * FROM testlog01; ``` ### What doesn't meet your expectations? ## Expected result ``` iop id f1 delete +I 11 11 0 +I 12 12 0 -D 12 12 0 +I 13 13 0 ``` (id=12 should delete because its last message is -U and no +U reaches the sink) ## Actual result ``` op id f1 delete +I 11 11 0 +I 12 12 0 -U 12 12 0 +U 01 12 0 +I 13 13 0 ``` The stale row id=12 is still present, breaking data correctness. ### Anything else? _No response_ ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
